[FINAL WARNING] kvmarm@lists.cs.columbia.edu going away

2023-01-06 Thread Marc Zyngier
Hi folks,

After many years of awesome service, the kvmarm mailing list hosted by
Columbia is being decommissioned. No new email will be archived on
lore.kernel.org, and I am placing the old list under emergency
moderation *NOW*.

If you haven't yet subscribed to the new kvmarm list and still want to
be involved, please read below!

The new list is hosted by the Linux Foundation at

kvm...@lists.linux.dev

and can be subscribed to by sending an email to:

kvmarm+subscr...@lists.linux.dev

More details can be found at:

https://subspace.kernel.org/lists.linux.dev.html

I'm looking forward to seeing you all on the new list!

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[GIT PULL] KVM/arm64 fixes for 6.2, take #1

2023-01-05 Thread Marc Zyngier
Hi Paolo,

Happy new year!

Here's the first batch of fixes for KVM/arm64 for 6.2. We have two
important fixes this time around, one for the PMU emulation, and the
other for guest page table walks in read-only memslots, something that
EFI has started doing...

The rest is mostly documentation updates, cleanups, and an update to
the list of reviewers (Alexandru stepping down, and Zenghui joining
the fun).

Please pull,

M.


The following changes since commit 88603b6dc419445847923fcb7fe5080067a30f98:

  Linux 6.2-rc2 (2023-01-01 13:53:16 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvmarm-fixes-6.2-1

for you to fetch changes up to de535c0234dd2dbd9c790790f2ca1c4ec8a52d2b:

  Merge branch kvm-arm64/MAINTAINERS into kvmarm-master/fixes (2023-01-05 
15:26:53 +)


KVM/arm64 fixes for 6.2, take #1

- Fix the PMCR_EL0 reset value after the PMU rework

- Correctly handle S2 fault triggered by a S1 page table walk
  by not always classifying it as a write, as this breaks on
  R/O memslots

- Document why we cannot exit with KVM_EXIT_MMIO when taking
  a write fault from a S1 PTW on a R/O memslot

- Put the Apple M2 on the naughty step for not being able to
  correctly implement the vgic SEIS feature, just liek the M1
  before it

- Reviewer updates: Alex is stepping down, replaced by Zenghui


Alexandru Elisei (1):
  MAINTAINERS: Remove myself as a KVM/arm64 reviewer

James Clark (1):
  KVM: arm64: PMU: Fix PMCR_EL0 reset value

Marc Zyngier (8):
  KVM: arm64: Fix S1PTW handling on RO memslots
  KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
  KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS 
implementations
  Merge branch kvm-arm64/pmu-fixes-6.2 into kvmarm-master/fixes
  Merge branch kvm-arm64/s1ptw-write-fault into kvmarm-master/fixes
  MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
  Merge branch kvm-arm64/MAINTAINERS into kvmarm-master/fixes

 Documentation/virt/kvm/api.rst  |  8 +++
 MAINTAINERS |  2 +-
 arch/arm64/include/asm/cputype.h|  4 
 arch/arm64/include/asm/esr.h|  9 +++
 arch/arm64/include/asm/kvm_arm.h| 15 
 arch/arm64/include/asm/kvm_emulate.h| 42 +++--
 arch/arm64/kvm/hyp/include/hyp/fault.h  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/mmu.c| 21 ++---
 arch/arm64/kvm/sys_regs.c   |  2 +-
 arch/arm64/kvm/vgic/vgic-v3.c   |  2 ++
 11 files changed, 69 insertions(+), 40 deletions(-)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH rcu 02/27] arch/arm64/kvm: Remove "select SRCU"

2023-01-05 Thread Marc Zyngier
On Thu, 05 Jan 2023 00:37:48 +,
"Paul E. McKenney"  wrote:
> 
> Now that the SRCU Kconfig option is unconditionally selected, there is
> no longer any point in selecting it.  Therefore, remove the "select SRCU"
> Kconfig statements.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Alexandru Elisei 
> Cc: Suzuki K Poulose 
> Cc: Oliver Upton 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: 
> Cc: 
> Cc: 
> ---
>  arch/arm64/kvm/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 05da3c8f7e88f..312f0e9869111 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -28,7 +28,6 @@ menuconfig KVM
>   select KVM_MMIO
>   select KVM_GENERIC_DIRTYLOG_READ_PROTECT
>   select KVM_XFER_TO_GUEST_WORK
> - select SRCU
>   select KVM_VFIO
>   select HAVE_KVM_EVENTFD
>   select HAVE_KVM_IRQFD

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations

2023-01-03 Thread Marc Zyngier
Hi Nathan,

On Tue, 03 Jan 2023 17:59:12 +,
Nathan Chancellor  wrote:
> 
> Hi Marc,
> 
> I have only been lurking on the kvmarm mailing list for a little bit and
> it has mainly just been reading patches/review to get more familiar with
> various virtualization concepts so I apologize if the following review
> is rather shallow but...

No need to apologise. Any extra pair of eye is welcome, specially when
the idiot behind the keyboard writes stuff like the patch below...

> 
> On Tue, Jan 03, 2023 at 09:50:20AM +, Marc Zyngier wrote:
> > I really hoped that Apple had fixed their not-quite-a-vgic implementation
> > when moving from M1 to M2. Alas, it seems they didn't, and running
> > a buggy EFI version results in the vgic generating SErrors outside
> > of the guest and taking the host down.
> > 
> > Apply the same workaround as for M1. Yes, this is all a bit crap.
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/include/asm/cputype.h | 4 
> >  arch/arm64/kvm/vgic/vgic-v3.c| 3 ++-
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/cputype.h 
> > b/arch/arm64/include/asm/cputype.h
> > index 4e8b66c74ea2..683ca3af4084 100644
> > --- a/arch/arm64/include/asm/cputype.h
> > +++ b/arch/arm64/include/asm/cputype.h
> > @@ -124,6 +124,8 @@
> >  #define APPLE_CPU_PART_M1_FIRESTORM_PRO0x025
> >  #define APPLE_CPU_PART_M1_ICESTORM_MAX 0x028
> >  #define APPLE_CPU_PART_M1_FIRESTORM_MAX0x029
> > +#define APPLE_CPU_PART_M2_BLIZZARD 0x032
> > +#define APPLE_CPU_PART_M2_AVALANCHE0x033
> >  
> >  #define AMPERE_CPU_PART_AMPERE10xAC3
> >  
> > @@ -177,6 +179,8 @@
> >  #define MIDR_APPLE_M1_FIRESTORM_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
> > APPLE_CPU_PART_M1_FIRESTORM_PRO)
> >  #define MIDR_APPLE_M1_ICESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
> > APPLE_CPU_PART_M1_ICESTORM_MAX)
> >  #define MIDR_APPLE_M1_FIRESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
> > APPLE_CPU_PART_M1_FIRESTORM_MAX)
> > +#define MIDR_APPLE_M2_BLIZZARD MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
> > APPLE_CPU_PART_M2_BLIZZARD)
> > +#define MIDR_APPLE_M2_AVALANCHE MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
> > APPLE_CPU_PART_M2_AVALANCHE)
> >  #define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, 
> > AMPERE_CPU_PART_AMPERE1)
> >  
> >  /* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
> > diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> > index 826ff6f2a4e7..c6442b08fe80 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v3.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> > @@ -615,7 +615,8 @@ static const struct midr_range broken_seis[] = {
> > MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_PRO),
> > MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_PRO),
> > MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_MAX),
> > -   MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_MAX),
> 
> The commit message makes no note of this removal, was it intentional?
> MIDR_APPLE_M1_FIRESTORM_MAX is only used here so I assume it is not.

Absolutely not intentional :-/ Thanks a lot for spotting this!

I'll fix this immediately (good thing I didn't send the fixes PR!).

Thanks again,

   M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/1] KVM: arm64: PMU: Fix PMCR_EL0 reset value

2023-01-03 Thread Marc Zyngier
On Fri, 9 Dec 2022 16:44:45 +, James Clark wrote:
> We noticed qemu failing to run because of an assert on our CI. I don't see 
> the issue anymore with
> this fix.
> 
> Applies to kvmarm/next (753d734f3f34)
> 
> Thanks
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: PMU: Fix PMCR_EL0 reset value
  commit: edb4929ea9ba48cd91e3867041f49e4c34d729ed

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: (subset) [PATCH 0/3] KVM: arm64: M2 vgic maintenance interrupt rework pre-NV

2023-01-03 Thread Marc Zyngier
On Tue, 3 Jan 2023 09:50:19 +, Marc Zyngier wrote:
> I've spent the holiday break reviving the Nested Virt KVM/arm64
> implementation[1] and allowing it to work on the Apple M2 SoC. The
> amusing part is that it actually works!
> 
> However, the way the vgic is implemented on this HW is still at odds
> with the rest of the architecture, and requires some hacks, some of
> which are independent of the actual NV code. This is what this series
> is about.
> 
> [...]

Applied to fixes, thanks!

[1/3] KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS 
implementations
  commit: 4f6202c9fb51cc6a2593ad37d8ddff136b7acef2

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer

2023-01-03 Thread Marc Zyngier
On Tue, 3 Jan 2023 12:39:33 +, Marc Zyngier wrote:
> Zenghui has been around for quite some time, and has been instrumental
> in reviewing the GICv4/4.1 KVM support. I'm delighted that he's agreed
> to help with the patch review in a more official capacity!

Applied to fixes, thanks!

[1/1] MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
  commit: f4d488bcbeedf5f625904beef0e1e55d85cb29c9

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] MAINTAINERS: Remove myself as a KVM/arm64 reviewer

2023-01-03 Thread Marc Zyngier
On Tue, 03 Jan 2023 12:07:36 +,
Alexandru Elisei  wrote:
> 
> Haven't done any meaningful reviews for more than a year, and it doesn't
> look like I'll be able to do so in the future. Make it official and remove
> myself from the KVM/arm64 "Reviewers" list.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7f86d02cb427..813673637500 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11356,7 +11356,6 @@ F:virt/kvm/*
>  KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
>  M:   Marc Zyngier 
>  R:   James Morse 
> -R:   Alexandru Elisei 
>  R:   Suzuki K Poulose 
>  R:   Oliver Upton 
>  L:   linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)

[+ the old kvmarm list, as the new one isn't archived yet]

I'm sad to see you go, but hopefully we'll still count you as a
contributor. Thanks again for all your work along the years.

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/7] KVM: arm64: PMU: Allow userspace to limit the number of PMCs on vCPU

2023-01-03 Thread Marc Zyngier
On Tue, 03 Jan 2023 12:40:34 +,
Jonathan Cameron  wrote:
> 
> On Thu, 29 Dec 2022 19:59:21 -0800
> Reiji Watanabe  wrote:
> 
> > The goal of this series is to allow userspace to limit the number
> > of PMU event counters on the vCPU.
> 
> Hi Rieji,
> 
> Why do you want to do this?
> 
> I can conjecture a bunch of possible reasons, but they may not
> match up with your use case. It would be useful to have that information
> in the cover letter.

The most obvious use case is to support migration across systems that
implement different numbers of counters. Similar reasoning could also
apply to the debug infrastructure (watchpoints, breakpoints).

In any case, being able to decouple the VM from the underlying HW
within the extent that the architecture permits it seems like a
valuable goal.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer

2023-01-03 Thread Marc Zyngier
Zenghui has been around for quite some time, and has been instrumental
in reviewing the GICv4/4.1 KVM support. I'm delighted that he's agreed
to help with the patch review in a more official capacity!

Cc: Zenghui Yu 
Signed-off-by: Marc Zyngier 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f61eb221415b..551544d877a3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11359,6 +11359,7 @@ R:  James Morse 
 R: Alexandru Elisei 
 R: Suzuki K Poulose 
 R: Oliver Upton 
+R: Zenghui Yu 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 L: kvm...@lists.linux.dev
 L: kvmarm@lists.cs.columbia.edu (deprecated, moderated for non-subscribers)
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*

2023-01-03 Thread Marc Zyngier
The former is an AArch32 legacy, so let's move over to the
verbose (and strictly identical) version.

This involves moving some of the #defines that were private
to KVM into the more generic esr.h.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/esr.h|  9 +
 arch/arm64/include/asm/kvm_arm.h| 15 ---
 arch/arm64/include/asm/kvm_emulate.h| 20 ++--
 arch/arm64/kvm/hyp/include/hyp/fault.h  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/mmu.c| 21 -
 6 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 15b34fbfca66..206de10524e3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -114,6 +114,15 @@
 #define ESR_ELx_FSC_ACCESS (0x08)
 #define ESR_ELx_FSC_FAULT  (0x04)
 #define ESR_ELx_FSC_PERM   (0x0C)
+#define ESR_ELx_FSC_SEA_TTW0   (0x14)
+#define ESR_ELx_FSC_SEA_TTW1   (0x15)
+#define ESR_ELx_FSC_SEA_TTW2   (0x16)
+#define ESR_ELx_FSC_SEA_TTW3   (0x17)
+#define ESR_ELx_FSC_SECC   (0x18)
+#define ESR_ELx_FSC_SECC_TTW0  (0x1c)
+#define ESR_ELx_FSC_SECC_TTW1  (0x1d)
+#define ESR_ELx_FSC_SECC_TTW2  (0x1e)
+#define ESR_ELx_FSC_SECC_TTW3  (0x1f)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT  (24)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 0df3fc3a0173..26b0c97df986 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -319,21 +319,6 @@
 BIT(18) |  \
 GENMASK(16, 15))
 
-/* For compatibility with fault code shared with 32-bit */
-#define FSC_FAULT  ESR_ELx_FSC_FAULT
-#define FSC_ACCESS ESR_ELx_FSC_ACCESS
-#define FSC_PERM   ESR_ELx_FSC_PERM
-#define FSC_SEAESR_ELx_FSC_EXTABT
-#define FSC_SEA_TTW0   (0x14)
-#define FSC_SEA_TTW1   (0x15)
-#define FSC_SEA_TTW2   (0x16)
-#define FSC_SEA_TTW3   (0x17)
-#define FSC_SECC   (0x18)
-#define FSC_SECC_TTW0  (0x1c)
-#define FSC_SECC_TTW1  (0x1d)
-#define FSC_SECC_TTW2  (0x1e)
-#define FSC_SECC_TTW3  (0x1f)
-
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK (~UL(0xf))
 /*
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 0d40c48d8132..193583df2d9c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -349,16 +349,16 @@ static __always_inline u8 
kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *v
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
 {
switch (kvm_vcpu_trap_get_fault(vcpu)) {
-   case FSC_SEA:
-   case FSC_SEA_TTW0:
-   case FSC_SEA_TTW1:
-   case FSC_SEA_TTW2:
-   case FSC_SEA_TTW3:
-   case FSC_SECC:
-   case FSC_SECC_TTW0:
-   case FSC_SECC_TTW1:
-   case FSC_SECC_TTW2:
-   case FSC_SECC_TTW3:
+   case ESR_ELx_FSC_EXTABT:
+   case ESR_ELx_FSC_SEA_TTW0:
+   case ESR_ELx_FSC_SEA_TTW1:
+   case ESR_ELx_FSC_SEA_TTW2:
+   case ESR_ELx_FSC_SEA_TTW3:
+   case ESR_ELx_FSC_SECC:
+   case ESR_ELx_FSC_SECC_TTW0:
+   case ESR_ELx_FSC_SECC_TTW1:
+   case ESR_ELx_FSC_SECC_TTW2:
+   case ESR_ELx_FSC_SECC_TTW3:
return true;
default:
return false;
diff --git a/arch/arm64/kvm/hyp/include/hyp/fault.h 
b/arch/arm64/kvm/hyp/include/hyp/fault.h
index 1b8a2dcd712f..9ddcfe2c3e57 100644
--- a/arch/arm64/kvm/hyp/include/hyp/fault.h
+++ b/arch/arm64/kvm/hyp/include/hyp/fault.h
@@ -60,7 +60,7 @@ static inline bool __get_fault_info(u64 esr, struct 
kvm_vcpu_fault_info *fault)
 */
if (!(esr & ESR_ELx_S1PTW) &&
(cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
-(esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
+(esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM)) {
if (!__translate_far_to_hpfar(far, ))
return false;
} else {
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 3330d1b76bdd..07d37ff88a3f 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -367,7 +367,7 @@ static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, 
u64 *exit_code)
if (static_branch_unlikely(_v2_cpuif_trap)) {
bool valid;
 
-   valid = kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT &&
+   valid = kvm_vcpu_trap_get_fault_type(vcpu) == ESR_ELx_FSC_FAULT 
&&
kvm_vcpu_dabt_isvalid(vcpu) &&
!kvm_vcpu_abt_issea(vcpu) &&
!kvm_vcpu_abt_iss1tw(vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..a3ee3

[PATCH v2 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

2023-01-03 Thread Marc Zyngier
A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into the
IPA space as part of a read-only memslot. Not only is this legitimate,
but it also results in added security, so thumbs up.

It is possible to take an S1PTW translation fault if the S1 PTs are
unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
write to correctly handle hardware AF/DB updates to the S1 PTs.
Furthermore, KVM injects an exception into the guest for S1PTW writes.
In the aforementioned case this results in the guest taking an abort
it won't recover from, as the S1 PTs mapping the vectors suffer from
the same problem.

So clearly our handling is... wrong.

Instead, switch to a two-pronged approach:

- On S1PTW translation fault, handle the fault as a read

- On S1PTW permission fault, handle the fault as a write

This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use HW-assisted AF/DB anyway, as that'd be wrong.

Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.

Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault 
on instruction fetch")
Reviewed-by: Oliver Upton 
Reviewed-by: Ard Biesheuvel 
Regression-tested-by: Ard Biesheuvel 
Signed-off-by: Marc Zyngier 
Cc: sta...@vger.kernel.org
---
 arch/arm64/include/asm/kvm_emulate.h | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..0d40c48d8132 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
kvm_vcpu *vcpu)
 
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {
-   if (kvm_vcpu_abt_iss1tw(vcpu))
-   return true;
+   if (kvm_vcpu_abt_iss1tw(vcpu)) {
+   /*
+* Only a permission fault on a S1PTW should be
+* considered as a write. Otherwise, page tables baked
+* in a read-only memslot will result in an exception
+* being delivered in the guest.
+*
+* The drawback is that we end-up faulting twice if the
+* guest is using any of HW AF/DB: a translation fault
+* to map the page containing the PT (read only at
+* first), then a permission fault to allow the flags
+* to be set.
+*/
+   switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
+   case ESR_ELx_FSC_PERM:
+   return true;
+   default:
+   return false;
+   }
+   }
 
if (kvm_vcpu_trap_is_iabt(vcpu))
return false;
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/3] KVM: arm64: Document the behaviour of S1PTW faults on RO memslots

2023-01-03 Thread Marc Zyngier
Although the KVM API says that a write to a RO memslot must result
in a KVM_EXIT_MMIO describing the write, the arm64 architecture
doesn't provide the *data* written by a Stage-1 page table walk
(we only get the address).

Since there isn't much userspace can do with so little information
anyway, document the fact that such an access results in a guest
exception, not an exit. This is consistent with the guest being
terminally broken anyway.

Reviewed-by: Oliver Upton 
Signed-off-by: Marc Zyngier 
---
 Documentation/virt/kvm/api.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0dd5d8733dd5..42db72a0cbe6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1354,6 +1354,14 @@ the memory region are automatically reflected into the 
guest.  For example, an
 mmap() that affects the region will be made visible immediately.  Another
 example is madvise(MADV_DROP).
 
+Note: On arm64, a write generated by the page-table walker (to update
+the Access and Dirty flags, for example) never results in a
+KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
+is because KVM cannot provide the data that would be written by the
+page-table walker, making it impossible to emulate the access.
+Instead, an abort (data abort if the cause of the page-table update
+was a load or a store, instruction abort if it was an instruction
+fetch) is injected in the guest.
 
 4.36 KVM_SET_TSS_ADDR
 -
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots

2023-01-03 Thread Marc Zyngier
Recent developments on the EFI front have resulted in guests that
simply won't boot if the page tables are in a read-only memslot and
that you're a bit unlucky in the way S2 gets paged in... The core
issue is related to the fact that we treat a S1PTW as a write, which
is close enough to what needs to be done. Until to get to RO memslots.

The first patch fixes this and is definitely a stable candidate. It
splits the faulting of page tables in two steps (RO translation fault,
followed by a writable permission fault -- should it even happen).
The second one documents the slightly odd behaviour of PTW writes to
RO memslot, which do not result in a KVM_MMIO exit. The last patch is
totally optional, only tangentially related, and randomly repainting
stuff (maybe that's contagious, who knows).

The whole thing is on top of v6.1-rc2.

I plan to take this in as a fix shortly.

M.

* From v1:

  - Added the documentation patch

  - Dropped the AF micro-optimisation, as it was creating more
confusion, was hard to test, and was of dubious value

  - Collected RBs, with thanks

Marc Zyngier (3):
  KVM: arm64: Fix S1PTW handling on RO memslots
  KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*

 Documentation/virt/kvm/api.rst  |  8 +
 arch/arm64/include/asm/esr.h|  9 ++
 arch/arm64/include/asm/kvm_arm.h| 15 -
 arch/arm64/include/asm/kvm_emulate.h| 42 ++---
 arch/arm64/kvm/hyp/include/hyp/fault.h  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/mmu.c| 21 +++--
 7 files changed, 61 insertions(+), 38 deletions(-)

-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 3/3] irqchip/apple-aic: Register vgic maintenance interrupt with KVM

2023-01-03 Thread Marc Zyngier
In order to deliver vgic maintenance interrupts that Nested Virt
requires, hook it into the FIQ space, even if it is delivered
as an IRQ (we don't distinguish between the two anyway).

Signed-off-by: Marc Zyngier 
---
 drivers/irqchip/irq-apple-aic.c | 55 +
 1 file changed, 42 insertions(+), 13 deletions(-)

diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c
index ae3437f03e6c..09fd52d91e45 100644
--- a/drivers/irqchip/irq-apple-aic.c
+++ b/drivers/irqchip/irq-apple-aic.c
@@ -210,7 +210,6 @@
 FIELD_PREP(AIC_EVENT_NUM, x))
 #define AIC_HWIRQ_IRQ(x)   FIELD_GET(AIC_EVENT_NUM, x)
 #define AIC_HWIRQ_DIE(x)   FIELD_GET(AIC_EVENT_DIE, x)
-#define AIC_NR_FIQ 6
 #define AIC_NR_SWIPI   32
 
 /*
@@ -222,11 +221,18 @@
  * running at EL2 (with VHE). When the kernel is running at EL1, the
  * mapping differs and aic_irq_domain_translate() performs the remapping.
  */
-
-#define AIC_TMR_EL0_PHYS   AIC_TMR_HV_PHYS
-#define AIC_TMR_EL0_VIRT   AIC_TMR_HV_VIRT
-#define AIC_TMR_EL02_PHYS  AIC_TMR_GUEST_PHYS
-#define AIC_TMR_EL02_VIRT  AIC_TMR_GUEST_VIRT
+enum fiq_hwirq {
+   /* Must be ordered as in apple-aic.h */
+   AIC_TMR_EL0_PHYS= AIC_TMR_HV_PHYS,
+   AIC_TMR_EL0_VIRT= AIC_TMR_HV_VIRT,
+   AIC_TMR_EL02_PHYS   = AIC_TMR_GUEST_PHYS,
+   AIC_TMR_EL02_VIRT   = AIC_TMR_GUEST_VIRT,
+   AIC_CPU_PMU_Effi= AIC_CPU_PMU_E,
+   AIC_CPU_PMU_Perf= AIC_CPU_PMU_P,
+   /* No need for this to be discovered from DT */
+   AIC_VGIC_MI,
+   AIC_NR_FIQ
+};
 
 static DEFINE_STATIC_KEY_TRUE(use_fast_ipi);
 
@@ -384,14 +390,20 @@ static void __exception_irq_entry aic_handle_irq(struct 
pt_regs *regs)
 
/*
 * vGIC maintenance interrupts end up here too, so we need to check
-* for them separately. This should never trigger if KVM is working
-* properly, because it will have already taken care of clearing it
-* on guest exit before this handler runs.
+* for them separately. It should however only trigger when NV is
+* in use, and be cleared when coming back from the handler.
 */
-   if (is_kernel_in_hyp_mode() && (read_sysreg_s(SYS_ICH_HCR_EL2) & 
ICH_HCR_EN) &&
-   read_sysreg_s(SYS_ICH_MISR_EL2) != 0) {
-   pr_err_ratelimited("vGIC IRQ fired and not handled by KVM, 
disabling.\n");
-   sysreg_clear_set_s(SYS_ICH_HCR_EL2, ICH_HCR_EN, 0);
+   if (is_kernel_in_hyp_mode() &&
+   (read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) &&
+   read_sysreg_s(SYS_ICH_MISR_EL2) != 0) {
+   generic_handle_domain_irq(aic_irqc->hw_domain,
+ AIC_FIQ_HWIRQ(AIC_VGIC_MI));
+
+   if (unlikely((read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) &&
+read_sysreg_s(SYS_ICH_MISR_EL2))) {
+   pr_err_ratelimited("vGIC IRQ fired and not handled by 
KVM, disabling.\n");
+   sysreg_clear_set_s(SYS_ICH_HCR_EL2, ICH_HCR_EN, 0);
+   }
}
 }
 
@@ -1178,6 +1190,23 @@ static int __init aic_of_ic_init(struct device_node 
*node, struct device_node *p
  "irqchip/apple-aic/ipi:starting",
  aic_init_cpu, NULL);
 
+   if (is_kernel_in_hyp_mode()) {
+   struct irq_fwspec mi = {
+   .fwnode = of_node_to_fwnode(node),
+   .param_count= 3,
+   .param  = {
+   [0] = AIC_FIQ, /* This is a lie */
+   [1] = AIC_VGIC_MI,
+   [2] = IRQ_TYPE_LEVEL_HIGH,
+   },
+   };
+
+   vgic_info.maint_irq = irq_domain_alloc_irqs(irqc->hw_domain,
+   1, NUMA_NO_NODE,
+   );
+   WARN_ON(!vgic_info.maint_irq);
+   }
+
vgic_set_kvm_info(_info);
 
pr_info("Initialized with %d/%d IRQs * %d/%d die(s), %d FIQs, %d vIPIs",
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/3] KVM: arm64: M2 vgic maintenance interrupt rework pre-NV

2023-01-03 Thread Marc Zyngier
Hi all,

I've spent the holiday break reviving the Nested Virt KVM/arm64
implementation[1] and allowing it to work on the Apple M2 SoC. The
amusing part is that it actually works!

However, the way the vgic is implemented on this HW is still at odds
with the rest of the architecture, and requires some hacks, some of
which are independent of the actual NV code. This is what this series
is about.

The first patch places M2 on the naughty list of broken SEIS
implementations, just like the M1 before it. The second patch allows
a vgic MI to be registered, even if this MI cannot be masked (we
disable it at the source anyway). The last patch hacks the AIC driver
to actually register the vgic MI with KVM.

I plan to take the first patch as a fix for 6.2, while the rest can be
deferred to 6.3.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.2-WIP

Marc Zyngier (3):
  KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS
implementations
  KVM: arm64: vgic: Allow registration of a non-maskable maintenance
interrupt
  irqchip/apple-aic: Register vgic maintenance interrupt with KVM

 arch/arm64/include/asm/cputype.h |  4 +++
 arch/arm64/kvm/vgic/vgic-init.c  |  2 +-
 arch/arm64/kvm/vgic/vgic-v3.c|  3 +-
 drivers/irqchip/irq-apple-aic.c  | 55 
 4 files changed, 49 insertions(+), 15 deletions(-)

-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/3] KVM: arm64: vgic: Allow registration of a non-maskable maintenance interrupt

2023-01-03 Thread Marc Zyngier
Our Apple M1/M2 friends do have a per-CPU maintenance interrupt,
but no mask to make use of it in the standard Linux framework.

Given that KVM directly drives the *source* of the interrupt and
leaves the GIC interrupt always enabled, there is no harm in tolerating
such a setup. It will become useful once we enable NV on M2 HW.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/vgic/vgic-init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index f6d4f4052555..e61d9ca01768 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -572,7 +572,7 @@ int kvm_vgic_hyp_init(void)
if (ret)
return ret;
 
-   if (!has_mask)
+   if (!has_mask && !kvm_vgic_global_state.maint_irq)
return 0;
 
ret = request_percpu_irq(kvm_vgic_global_state.maint_irq,
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/3] KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations

2023-01-03 Thread Marc Zyngier
I really hoped that Apple had fixed their not-quite-a-vgic implementation
when moving from M1 to M2. Alas, it seems they didn't, and running
a buggy EFI version results in the vgic generating SErrors outside
of the guest and taking the host down.

Apply the same workaround as for M1. Yes, this is all a bit crap.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/cputype.h | 4 
 arch/arm64/kvm/vgic/vgic-v3.c| 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 4e8b66c74ea2..683ca3af4084 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -124,6 +124,8 @@
 #define APPLE_CPU_PART_M1_FIRESTORM_PRO0x025
 #define APPLE_CPU_PART_M1_ICESTORM_MAX 0x028
 #define APPLE_CPU_PART_M1_FIRESTORM_MAX0x029
+#define APPLE_CPU_PART_M2_BLIZZARD 0x032
+#define APPLE_CPU_PART_M2_AVALANCHE0x033
 
 #define AMPERE_CPU_PART_AMPERE10xAC3
 
@@ -177,6 +179,8 @@
 #define MIDR_APPLE_M1_FIRESTORM_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
APPLE_CPU_PART_M1_FIRESTORM_PRO)
 #define MIDR_APPLE_M1_ICESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
APPLE_CPU_PART_M1_ICESTORM_MAX)
 #define MIDR_APPLE_M1_FIRESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
APPLE_CPU_PART_M1_FIRESTORM_MAX)
+#define MIDR_APPLE_M2_BLIZZARD MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
APPLE_CPU_PART_M2_BLIZZARD)
+#define MIDR_APPLE_M2_AVALANCHE MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, 
APPLE_CPU_PART_M2_AVALANCHE)
 #define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, 
AMPERE_CPU_PART_AMPERE1)
 
 /* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 826ff6f2a4e7..c6442b08fe80 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -615,7 +615,8 @@ static const struct midr_range broken_seis[] = {
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_PRO),
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_PRO),
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_MAX),
-   MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_MAX),
+   MIDR_ALL_VERSIONS(MIDR_APPLE_M2_BLIZZARD),
+   MIDR_ALL_VERSIONS(MIDR_APPLE_M2_AVALANCHE),
{},
 };
 
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/3] KVM: arm64: M2 vgic maintenance interrupt rework pre-NV

2023-01-03 Thread Marc Zyngier
Hi all,

I've spent the holiday break reviving the Nested Virt KVM/arm64
implementation[1] and allowing it to work on the Apple M2 SoC. The
amusing part is that it actually works!

However, the way the vgic is implemented on this HW is still at odds
with the rest of the architecture, and requires some hacks, some of
which are independent of the actual NV code. This is what this series
is about.

The first patch places M2 on the naughty list of broken SEIS
implementations, just like the M1 before it. The second patch allows
a vgic MI to be registered, even if this MI cannot be masked (we
disable it at the source anyway). The last patch hacks the AIC driver
to actually register the vgic MI with KVM.

I plan to take the first patch as a fix for 6.2, while the rest can be
deferred to 6.3.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.2-WIP

Marc Zyngier (3):
  KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS
implementations
  KVM: arm64: vgic: Allow registration of a non-maskable maintenance
interrupt
  irqchip/apple-aic: Register vgic maintenance interrupt with KVM

 arch/arm64/include/asm/cputype.h |  4 +++
 arch/arm64/kvm/vgic/vgic-init.c  |  2 +-
 arch/arm64/kvm/vgic/vgic-v3.c|  3 +-
 drivers/irqchip/irq-apple-aic.c  | 55 
 4 files changed, 49 insertions(+), 15 deletions(-)

-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.

2023-01-02 Thread Marc Zyngier
On Thu, 29 Dec 2022 13:53:15 +,
Marc Zyngier  wrote:
> 
> On Wed, 24 Aug 2022 07:03:03 +0100,
> Ganapatrao Kulkarni  wrote:
> > 
> > Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
> > enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
> > loaded timer.
> > 
> > For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
> > bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
> > This results in the drop of interrupt from Guest-Hypervisor, where as
> > Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
> > to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
> > are stuck in Idle thread and rcu soft lockups are seen.
> > 
> > This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
> > trap handler is emulating the ISTATUS bit.
> > 
> > Adding code to set/emulate the ISTATUS when the emulated timers are fired.
> > 
> > Signed-off-by: Ganapatrao Kulkarni 
> > ---
> >  arch/arm64/kvm/arch_timer.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index 27a6ec46803a..0b32d943d2d5 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> >   struct arch_timer_context *timer,
> >   enum kvm_arch_timer_regs treg);
> >  static bool kvm_arch_timer_get_input_level(int vintid);
> > +static u64 read_timer_ctl(struct arch_timer_context *timer);
> >  
> >  static struct irq_ops arch_timer_irq_ops = {
> > .get_input_level = kvm_arch_timer_get_input_level,
> > @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct 
> > hrtimer *hrt)
> > return HRTIMER_RESTART;
> > }
> >  
> > +   /* Timer emulated, emulate ISTATUS also */
> > +   timer_set_ctl(ctx, read_timer_ctl(ctx));
> 
> Why should we do that for non-NV2 configurations?
> 
> > kvm_timer_update_irq(vcpu, true, ctx);
> > return HRTIMER_NORESTART;
> >  }
> > @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context 
> > *ctx)
> > trace_kvm_timer_emulate(ctx, should_fire);
> >  
> > if (should_fire != ctx->irq.level) {
> > +   /* Timer emulated, emulate ISTATUS also */
> > +   timer_set_ctl(ctx, read_timer_ctl(ctx));
> > kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
> > return;
> > }
> 
> I'm not overly keen on this. Yes, we can set the status bit there. But
> conversely, the bit will not get cleared when the guest reprograms the
> timer, and will take a full exit/entry cycle for it to appear.
> 
> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
> used to emulate something as dynamic as a timer.
> 
> It is only with FEAT_ECV that we can solve this correctly by trapping
> the counter/timer accesses and emulate them for the guest hypervisor.
> I'd rather we add support for that, as I expect all the FEAT_NV2
> implementations to have it (and hopefully FEAT_FGT as well).

So I went ahead and implemented some very basic FEAT_ECV support to
correctly emulate the timers (trapping the CTL/CVAL accesses).

Performance dropped like a rock (~30% extra overhead) for L2
exit-heavy workloads that are terminated in userspace, such as virtio.
For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
as we save/restore the timer context, and this is enough to make
things visibly slower, even on a pretty fast machine.

I managed to get *some* performance back by satisfying CTL/CVAL reads
very early on the exit path (a pretty common theme with NV). Which
means we end-up needing something like what you have -- only a bit
more complete. I came up with the following:

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 4945c5b96f05..a198a6211e2a 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, 
bool new_level,
 {
int ret;
 
+   /*
+* Paper over NV2 brokenness by publishing the interrupt status
+* bit. This still results in a poor quality of emulation (guest
+* writes will have no effect until the next exit).
+*
+* But hey, it's fast, right?
+*/
+   if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
+   (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(v

Re: [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size.

2022-12-29 Thread Marc Zyngier
On Wed, 24 Aug 2022 07:03:04 +0100,
Ganapatrao Kulkarni  wrote:
> 
> In NV case, Shadow stage 2 page table is created using host hypervisor
> page table configuration like page size, block size etc. Also, the shadow
> stage 2 table uses block level mapping if the Guest Hypervisor IPA is
> backed by the THP pages. However, this is resulting in illegal mapping of
> NestedVM IPA to Host Hypervisor PA, when Guest Hypervisor and Host
> hypervisor are configured with different pagesize.
> 
> Adding fix to avoid block level mapping in stage 2 mapping if
> max_map_size is smaller than the block size.
> 
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  arch/arm64/kvm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 6caa48da1b2e..3d4b53f153a1 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1304,7 +1304,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>* backed by a THP and thus use block mapping if possible.
>*/
>   if (vma_pagesize == PAGE_SIZE &&
> - !(max_map_size == PAGE_SIZE || device)) {
> + !(max_map_size < PMD_SIZE || device)) {
>   if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
>   vma_pagesize = fault_granule;
>   else

That's quite a nice catch. I guess this was the main issue with
running 64kB L1 on a 4kB L0? Now, I'm not that fond of the fix itself,
and I think max_map_size should always represent something that is a
valid size *on the host*, specially when outside of NV-specific code.

How about something like this instead:

@@ -1346,6 +1346,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 * table uses at least as big a mapping.
 */
max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+
+   if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE)
+   max_map_size = PMD_SIZE;
+   else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
+   max_map_size = PAGE_SIZE;
}
 
vma_pagesize = min(vma_pagesize, max_map_size);


Admittedly, this is a lot uglier than your fix. But it keep the nested
horror localised, and doesn't risk being reverted by accident by
people who would not take NV into account (can't blame them, really).

Can you please give it a go?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.

2022-12-29 Thread Marc Zyngier
On Wed, 24 Aug 2022 07:03:03 +0100,
Ganapatrao Kulkarni  wrote:
> 
> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
> loaded timer.
> 
> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
> This results in the drop of interrupt from Guest-Hypervisor, where as
> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
> are stuck in Idle thread and rcu soft lockups are seen.
> 
> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
> trap handler is emulating the ISTATUS bit.
> 
> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
> 
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  arch/arm64/kvm/arch_timer.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 27a6ec46803a..0b32d943d2d5 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> struct arch_timer_context *timer,
> enum kvm_arch_timer_regs treg);
>  static bool kvm_arch_timer_get_input_level(int vintid);
> +static u64 read_timer_ctl(struct arch_timer_context *timer);
>  
>  static struct irq_ops arch_timer_irq_ops = {
>   .get_input_level = kvm_arch_timer_get_input_level,
> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct 
> hrtimer *hrt)
>   return HRTIMER_RESTART;
>   }
>  
> + /* Timer emulated, emulate ISTATUS also */
> + timer_set_ctl(ctx, read_timer_ctl(ctx));

Why should we do that for non-NV2 configurations?

>   kvm_timer_update_irq(vcpu, true, ctx);
>   return HRTIMER_NORESTART;
>  }
> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
>   trace_kvm_timer_emulate(ctx, should_fire);
>  
>   if (should_fire != ctx->irq.level) {
> + /* Timer emulated, emulate ISTATUS also */
> + timer_set_ctl(ctx, read_timer_ctl(ctx));
>   kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
>   return;
>   }

I'm not overly keen on this. Yes, we can set the status bit there. But
conversely, the bit will not get cleared when the guest reprograms the
timer, and will take a full exit/entry cycle for it to appear.

Ergo, the architecture is buggy as memory (the VNCR page) cannot be
used to emulate something as dynamic as a timer.

It is only with FEAT_ECV that we can solve this correctly by trapping
the counter/timer accesses and emulate them for the guest hypervisor.
I'd rather we add support for that, as I expect all the FEAT_NV2
implementations to have it (and hopefully FEAT_FGT as well).

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired

2022-12-29 Thread Marc Zyngier
On Wed, 24 Aug 2022 07:03:02 +0100,
Ganapatrao Kulkarni  wrote:
> 
> From: D Scott Phillips 
> 
> The timer emulation logic goes into an infinite loop when the NestedVM(L2)
> timer is being emulated.
> 
> While the CPU is executing in L1 context, the L2 timers are emulated using
> host hrtimer. When the delta of cval and current time reaches zero, the
> vtimer interrupt is fired/forwarded to L2, however the emulation function
> in Host-Hypervisor(L0) is still restarting the hrtimer with an expiry time
> set to now, triggering hrtimer to fire immediately and resulting in a
> continuous trigger of hrtimer and endless looping in the timer emulation.
> 
> Adding a fix to avoid restarting of the hrtimer if the interrupt is
> already fired.
> 
> Signed-off-by: D Scott Phillips 
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  arch/arm64/kvm/arch_timer.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 2371796b1ab5..27a6ec46803a 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -472,7 +472,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
>   return;
>   }
>  
> - soft_timer_start(>hrtimer, kvm_timer_compute_delta(ctx));
> + if (!ctx->irq.level)
> + soft_timer_start(>hrtimer, kvm_timer_compute_delta(ctx));
>  }
>  
>  static void timer_save_state(struct arch_timer_context *ctx)

I think this is a regression introduced by bee038a67487 ("KVM:
arm/arm64: Rework the timer code to use a timer_map"), and you can see
it because the comment in this function doesn't make much sense
anymore.

Does the following work for you, mostly restoring the original code?

Thanks,

M.

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index ad2a5df88810..4945c5b96f05 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -480,7 +480,7 @@ static void timer_emulate(struct arch_timer_context *ctx)
 * scheduled for the future.  If the timer cannot fire at all,
 * then we also don't need a soft timer.
 */
-   if (!kvm_timer_irq_can_fire(ctx)) {
+   if (should_fire || !kvm_timer_irq_can_fire(ctx)) {
soft_timer_cancel(>hrtimer);
return;
}

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 00/50] KVM: Rework kvm_init() and hardware enabling

2022-12-28 Thread Marc Zyngier

On 2022-12-27 13:02, Paolo Bonzini wrote:
Queued, thanks.  I will leave this in kvm/queue after testing 
everything

else and moving it to kvm/next; this way, we can wait for test results
on other architectures.


Can you please make this a topic branch, and if possible based
on a released -rc? It would make it a lot easier for everyone.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: running openbsd on KVM running on fedora over raspberry pi 4

2022-12-26 Thread Marc Zyngier
On Fri, 23 Dec 2022 11:33:35 +,
Sandeep Gupta  wrote:
> 
> I am trying to run openbsd as guest OS.
> I am using this command to create the vm
> ```
> 
> virt-install --name openbsd1 --ram 2048 --vcpus 2 --disk
> path=/var/lib/libvirt/images/openbsd1.qcow2,format=qcow2,bus=virtio,size=20
>  --disk path=/tmp/install72.img --import --os-variant openbsd7.0
> --network=default --noautoconsole
> 
> ```
> But, on boot the server is not picking up the openbsd boot sequence.

I don't think this is directly related to KVM. I've been pretty
successful in running OpenBSD 7.0 on a variety of hosts. Not using
libvirt though, but directly using QEMU.

One thing you may want to do is to disable ACPI by pasing -no-acpi to
QEMU.

But overall, this is a question better asked on some libvirt forum.

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 7/7] KVM: arm64: Normalize cache configuration

2022-12-25 Thread Marc Zyngier
On Wed, 21 Dec 2022 20:40:16 +,
Akihiko Odaki  wrote:
> 
> Before this change, the cache configuration of the physical CPU was
> exposed to vcpus. This is problematic because the cache configuration a
> vcpu sees varies when it migrates between vcpus with different cache
> configurations.
> 
> Fabricate cache configuration from the sanitized value, which holds the
> CTR_EL0 value the userspace sees regardless of which physical CPU it
> resides on.
> 
> CLIDR_EL1 and CCSIDR_EL1 are now writable from the userspace so that
> the VMM can restore the values saved with the old kernel.
> 
> Suggested-by: Marc Zyngier 
> Signed-off-by: Akihiko Odaki 
> ---
>  arch/arm64/include/asm/cache.h|   3 +
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/reset.c|   1 +
>  arch/arm64/kvm/sys_regs.c | 229 +-
>  4 files changed, 141 insertions(+), 96 deletions(-)

[...]

>  /* Which cache CCSIDR represents depends on CSSELR value. */
> -static u32 get_ccsidr(u32 csselr)
> +static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> +{
> + if (vcpu->arch.ccsidr)
> + return vcpu->arch.ccsidr[csselr];
> +
> + /*
> +  * Fabricate a CCSIDR value as the overriding value does not exist.
> +  * The real CCSIDR value will not be used as it can vary by the
> +  * physical CPU which the vcpu currently resides in.
> +  *
> +  * The line size is determined with get_min_cache_line_size(), which
> +  * should be valid for all CPUs even if they have different cache
> +  * configuration.
> +  *
> +  * The associativity bits are cleared, meaning the geometry of all data
> +  * and unified caches (which are guaranteed to be PIPT and thus
> +  * non-aliasing) are 1 set and 1 way.
> +  * Guests should not be doing cache operations by set/way at all, and
> +  * for this reason, we trap them and attempt to infer the intent, so
> +  * that we can flush the entire guest's address space at the appropriate
> +  * time. The exposed geometry minimizes the number of the traps.
> +  * [If guests should attempt to infer aliasing properties from the
> +  * geometry (which is not permitted by the architecture), they would
> +  * only do so for virtually indexed caches.]
> +  */
> + return get_min_cache_line_size(csselr) << CCSIDR_EL1_LineSize_SHIFT;

It'd be nice to have a comment that says this relies on CCSIDR_EL1
being allowed to return an UNKNOWN value when CSSELR_EL1 does not
specify an implemented cache level (you always return something with
the I or D line-size).

> +}
> +
> +static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
>  {
> - u32 ccsidr;
> + u8 line_size = FIELD_GET(CCSIDR_EL1_LineSize, val);
> + u32 *ccsidr = vcpu->arch.ccsidr;
> + u32 i;
> +
> + if ((val & CCSIDR_EL1_RES0) || line_size < 
> get_min_cache_line_size(csselr))
> + return -EINVAL;
> +
> + if (!ccsidr) {
> + if (val == get_ccsidr(vcpu, csselr))
> + return 0;
> +
> + ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32), GFP_KERNEL);
> + if (!ccsidr)
> + return -ENOMEM;
> +
> + for (i = 0; i < CSSELR_MAX; i++)
> + ccsidr[i] = get_ccsidr(vcpu, i);
> +
> + vcpu->arch.ccsidr = ccsidr;
> + }
>  
> - /* Make sure noone else changes CSSELR during this! */
> - local_irq_disable();
> - write_sysreg(csselr, csselr_el1);
> - isb();
> - ccsidr = read_sysreg(ccsidr_el1);
> - local_irq_enable();
> + ccsidr[csselr] = val;
>  
> - return ccsidr;
> + return 0;
>  }
>  
>  /*
> @@ -1281,10 +1332,64 @@ static bool access_clidr(struct kvm_vcpu *vcpu, 
> struct sys_reg_params *p,
>   if (p->is_write)
>   return write_to_read_only(vcpu, p, r);
>  
> - p->regval = read_sysreg(clidr_el1);
> + p->regval = __vcpu_sys_reg(vcpu, r->reg);
>   return true;
>  }
>  
> +/*
> + * Fabricate a CLIDR_EL1 value instead of using the real value, which can 
> vary
> + * by the physical CPU which the vcpu currently resides in.
> + */
> +static void reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> + u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + u64 clidr;
> + u8 loc;
> +
> + if ((ctr_el0 & CTR_EL0_IDC) || 
> cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {

Having looked into this again, I *think* we can drop the FWB check, as
the above read_sanitised_ftr_reg() is populated from
read_cpuid_effective_cachetype

Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

2022-12-24 Thread Marc Zyngier
On Thu, 22 Dec 2022 13:01:55 +,
Ard Biesheuvel  wrote:
> 
> On Tue, 20 Dec 2022 at 21:09, Marc Zyngier  wrote:
> >
> > A recent development on the EFI front has resulted in guests having
> > their page tables baked in the firmware binary, and mapped into
> > the IPA space as part as a read-only memslot.
> >
> > Not only this is legitimate, but it also results in added security,
> > so thumbs up. However, this clashes mildly with our handling of a S1PTW
> > as a write to correctly handle AF/DB updates to the S1 PTs, and results
> > in the guest taking an abort it won't recover from (the PTs mapping the
> > vectors will suffer freom the same problem...).
> >
> > So clearly our handling is... wrong.
> >
> > Instead, switch to a two-pronged approach:
> >
> > - On S1PTW translation fault, handle the fault as a read
> >
> > - On S1PTW permission fault, handle the fault as a write
> >
> > This is of no consequence to SW that *writes* to its PTs (the write
> > will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> > use AF/DB anyway, as that'd be wrong.
> >
> > Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> > fault on S1PTW permission fault on instruction fetch") do we end-up
> > with two back-to-back faults (page being evicted and faulted back).
> > I don't think this is a case worth optimising for.
> >
> > Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission 
> > fault on instruction fetch")
> > Signed-off-by: Marc Zyngier 
> > Cc: sta...@vger.kernel.org
> 
> Reviewed-by: Ard Biesheuvel 
> 
> I have tested this patch on my TX2 with one of the EFI builds in
> question, and everything works as before (I never observed the issue
> itself)

If you get the chance, could you try with non-4kB page sizes? Here, I
could only reproduce it with 16kB pages. It was firing like clockwork
on Cortex-A55 with that.

> 
> Regression-tested-by: Ard Biesheuvel 
> 
> For the record, the EFI build in question targets QEMU/mach-virt and
> switches to a set of read-only page tables in emulated NOR flash
> straight out of reset, so it can create and populate the real page
> tables with MMU and caches enabled. EFI does not use virtual memory or
> paging so managing access flags or dirty bits in hardware is unlikely
> to add any value, and it is not being used at the moment. And given
> that this is emulated NOR flash, any ordinary write to it tears down
> the r/o memslot altogether, and kicks the NOR flash emulation in QEMU
> into programming mode, which is fully based on MMIO emulation and does
> not use a memslot at all. IOW, even if we could figure out what store
> the PTW was attempting to do, it is always going to be rejected since
> the r/o page tables can only be modified by 'programming' the NOR
> flash sector.

Indeed, and this would be a pretty dodgy setup anyway.

Thanks for having had a look,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-24 Thread Marc Zyngier
On Thu, 22 Dec 2022 20:58:40 +,
Oliver Upton  wrote:
> 
> On Thu, Dec 22, 2022 at 09:01:15AM +, Marc Zyngier wrote:
> > On Wed, 21 Dec 2022 17:46:24 +, Oliver Upton  
> > wrote:
> > >  - When UFFD is in use, translation faults are reported to userspace as
> > >writes when from a RW memslot and reads when from an RO memslot.
> > 
> > Not quite: translation faults are reported as reads if TCR_EL1.HA
> > isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> > this matches exactly the behaviour of the page-table walker, which
> > will update the S1 PTs only if this bit is set.
> 
> My bad, yes you're right. I conflated the use case here with the
> architectural state.
> 
> I'm probably being way too pedantic, but I just wanted to make sure we
> agree about the ensuing subtlety. More below:
> 
> > Or is it what userfaultfd does on its own? That'd be confusing...
> > 
> > > 
> > >  - S1 page table memory is spuriously marked as dirty, as we presume a
> > >write immediately follows the translation fault. That isn't entirely
> > >senseless, as it would mean both the target page and the S1 PT that
> > >maps it are both old. This is nothing new I suppose, just weird.
> > 
> > s/old/young/ ?
> > 
> > I think you're confusing the PT access with the access that caused the
> > PT access (I'll have that printed on a t-shirt, thank you very much).
> 
> I'd buy it!
> 
> > Here, we're not considering the cause of the PT access anymore. If
> > TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> > read, and only that page.
> 
> I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
> a write could possibly follow, but I don't think it requires it. The
> page table walker must first load the S1 PTE before writing to it.

Ah, you're talking of the write to the PTE. Too many writes!

My reasoning is based on Rule LFTXR in DDI0487I.a, which says:

"When the PE performs a hardware update of the AF, it sets the AF to 1
 in the corresponding descriptor in memory, in a coherent manner,
 using an atomic read-modify-write of that descriptor."

An atomic-or operation fits this description, and I cannot see
anything in the architecture that would prevent the write of a PTE
even if AF is already set, such as mandating something like a
test-and-set or compare-and-swap.

I'm not saying this is the only possible implementation, or even a
good one. But I don't think this is incompatible with what the
architecture mandates.

> 
> From AArch64.S1Translate() (DDI0487H.a):
> 
> (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, 
> walkparams, va, regime,
>ss, acctype, 
> iswrite, ispriv);
> 
> [...]
> 
> new_desc = descriptor;
> if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
>   // Set descriptor AF bit
>   new_desc<10> = '1';
> 
> [...]
> 
> // Either the access flag was clear or AP<2> is set
> if new_desc != descriptor then
>   if regime == Regime_EL10 && EL2Enabled() then
> s1aarch64 = TRUE;
>   s2fs1walk = TRUE;
>   aligned = TRUE;
>   iswrite = TRUE;
>   (s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, 
> s1aarch64,
>  ss, s2fs1walk, 
> AccType_ATOMICRW,
>  aligned, iswrite, 
> ispriv);
> 
> if s2fault.statuscode != Fault_None then
>   return (s2fault, AddressDescriptor UNKNOWN);
> else
>   descupdateaddress = descaddress;
> 
> (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
>walkparams.ee, 
> descupdateaddress)
> 
> Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
> descriptor. The second stage-2 walk for write is conditioned on having
> already fetched the stage-1 descriptor and determining the AF needs
> to be set.

The question is whether this is one possible implementation, or the
only possible implementation. My bet is on the former.

> Relating back to UFFD: if we expect KVM to do exactly what hardware
> does, UFFD should see an attempted read when the first walk fails
> because of an S2 translation fault. Based on this patch, though, we'd
> promote it to a write if TCR_EL1.HA == 1.
> 
> This has the additional nuance of marking the S1 PT's IPA as dirty, even
> though it might not actually have been written to. Having said that,
> the fal

Re: [PATCH 2/2] KVM: arm64: Remove use of ARM64_FEATURE_MASK()

2022-12-22 Thread Marc Zyngier
On Wed, 21 Dec 2022 18:06:10 +,
Mark Brown  wrote:
> 
> The KVM code makes extensive use of ARM64_FEATURE_MASK() to generate a
> mask for fields in the ID registers. This macro has the assumption that
> all feature fields are 4 bits wide but the architecture has evolved to
> add fields with other widths, such as the 1 bit fields in ID_AA64SMFR0_EL1,
> so we need to adjust the
> 
> We could fix this by making ARM64_FEATURE_MASK() use the generated macros
> that we have now but since one of these is a direct _MASK constant the
> result is something that's more verbose and less direct than just updating
> the users to directly use the generated mask macros, writing
> 
>   #define ARM64_FEATURE_MASK(x)   (x##_MASK)
> 
> obviously looks redundant and if we look at the users updating them turns
> 
>   val &= ~ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_CSV3);
> 
> into the more direct
> 
>   val &= ~ID_AA64PFR0_EL1_CSV3_MASK;

If the two are strictly equivalent, then let's use the former as it
results in a tiny diff.

Constantly repainting these files causes no end of conflicts when
rebasing large series (pKVM, NV...), and makes backporting of fixes
much harder than it should be. Specially considering that there is a
single occcurence of an ID register with non-4bit fields.

Just put a FIXME in the various files so that people do the repainting
as they change this code.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-22 Thread Marc Zyngier
On Wed, 21 Dec 2022 17:46:24 +,
Oliver Upton  wrote:
> 
> On Wed, Dec 21, 2022 at 08:46:06AM -0800, Ricardo Koller wrote:
> 
> [...]
> 
> > > - return false;
> > > + /* Can't introspect TCR_EL1 with pKVM */
> > > + if (kvm_vm_is_protected(vcpu->kvm))
> > > + return false;
> > > +
> > > + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > + afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
> > > ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > > +
> > > + if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > > + return false;
> > > +
> > > + return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> > 
> > Also tested this specific case using page_fault_test when the PT page is
> > marked for dirty logging with and without AF. In both cases there's a
> > single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked 
> > dirty
> > in the AF case. The RO and UFFD cases also work as expected.
> > 
> > Need to send some changes for page_fault_test as many tests assume that
> > any S1PTW is always a PT write, and are failing. Also need to add some new
> > tests for PTs in RO memslots (as it didn't make much sense before this
> > change).
> 
> So I actually wanted to bring up the issue of user visibility, glad your
> test picked up something.
> 
> This has two implications, which are rather odd.
> 
>  - When UFFD is in use, translation faults are reported to userspace as
>writes when from a RW memslot and reads when from an RO memslot.

Not quite: translation faults are reported as reads if TCR_EL1.HA
isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
this matches exactly the behaviour of the page-table walker, which
will update the S1 PTs only if this bit is set.

Or is it what userfaultfd does on its own? That'd be confusing...

> 
>  - S1 page table memory is spuriously marked as dirty, as we presume a
>write immediately follows the translation fault. That isn't entirely
>senseless, as it would mean both the target page and the S1 PT that
>maps it are both old. This is nothing new I suppose, just weird.

s/old/young/ ?

I think you're confusing the PT access with the access that caused the
PT access (I'll have that printed on a t-shirt, thank you very much).

Here, we're not considering the cause of the PT access anymore. If
TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
read, and only that page.

TCR_EL1.HD is what muddies the waters a bit. If it is set without HA
being set, we still handle the translation fault as a read, followed
by a write permission fault. But again, that's solely for the purpose
of the S1 PT. What happens for the mapped page is completely
independent.

> Marc, do you have any concerns about leaving this as-is for the time
> being? At least before we were doing the same thing (write fault) every
> time.

I have the ugly feeling we're talking at cross purpose here, mostly
because I don't get how userfaultfd fits in that picture. Can you shed
some light here?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

2022-12-21 Thread Marc Zyngier
On Wed, 21 Dec 2022 16:50:30 +,
Oliver Upton  wrote:
> 
> On Wed, Dec 21, 2022 at 09:35:06AM +, Marc Zyngier wrote:
> 
> [...]
> 
> > > > +   if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > > > +   /*
> > > > +* Only a permission fault on a S1PTW should be
> > > > +* considered as a write. Otherwise, page tables baked
> > > > +* in a read-only memslot will result in an exception
> > > > +* being delivered in the guest.
> > > 
> > > Somewhat of a tangent, but:
> > > 
> > > Aren't we somewhat unaligned with the KVM UAPI by injecting an
> > > exception in this case? I know we've been doing it for a while, but it
> > > flies in the face of the rules outlined in the
> > > KVM_SET_USER_MEMORY_REGION documentation.
> > 
> > That's an interesting point, and I certainly haven't considered that
> > for faults introduced by page table walks.
> > 
> > I'm not sure what userspace can do with that though. The problem is
> > that this is a write for which we don't have useful data: although we
> > know it is a page-table walker access, we don't know what it was about
> > to write. The instruction that caused the write is meaningless (it
> > could either be a load, a store, or an instruction fetch). How do you
> > populate the data[] field then?
> > 
> > If anything, this is closer to KVM_EXIT_ARM_NISV, for which we give
> > userspace the full ESR and ask it to sort it out. I doubt it will be
> > able to, but hey, maybe it is worth a shot. This would need to be a
> > different exit reason though, as NISV is explicitly for non-memslot
> > stuff.
> > 
> > In any case, the documentation for KVM_SET_USER_MEMORY_REGION needs to
> > reflect the fact that KVM_EXIT_MMIO cannot represent a fault due to a
> > S1 PTW.
> 
> Oh I completely agree with you here. I probably should have said before,
> I think the exit would be useless anyway. Getting the documentation in
> line with the intended behavior seems to be the best fix.

Right. How about something like this?

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 226b40baffb8..72abd018a618 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1381,6 +1381,14 @@ It is recommended to use this API instead of the 
KVM_SET_MEMORY_REGION ioctl.
 The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
 allocation and is deprecated.
 
+Note: On arm64, a write generated by the page-table walker (to update
+the Access and Dirty flags, for example) never results in a
+KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
+is because KVM cannot provide the data that would be written by the
+page-table walker, making it impossible to emulate the access.
+Instead, an abort (data abort if the cause of the page-table update
+was a load or a store, instruction abort if it was an instruction
+fetch) is injected in the guest.
 
 4.36 KVM_SET_TSS_ADDR
 -

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-21 Thread Marc Zyngier
Hi Ricardo,

On Wed, 21 Dec 2022 16:46:06 +,
Ricardo Koller  wrote:
> 
> Hello,
> 
> On Tue, Dec 20, 2022 at 08:09:22PM +, Marc Zyngier wrote:
> > As a minor optimisation, we can retrofit the "S1PTW is a write
> > even on translation fault" concept *if* the vcpu is using the
> > HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
> > to result in an update of the PTE.
> > 
> > However, we cannot do the same thing for DB, as it would require
> > us to parse the PTs to find out if the DBM bit is set there.
> > This is not going to happen.
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h | 20 +++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index fd6ad8b21f85..4ee467065042 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
> > kvm_vcpu *vcpu)
> >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >  {
> > if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > +   unsigned int afdb;
> > +   u64 mmfr1;
> > +
> > /*
> >  * Only a permission fault on a S1PTW should be
> >  * considered as a write. Otherwise, page tables baked
> > @@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu 
> > *vcpu)
> >  * to map the page containing the PT (read only at
> >  * first), then a permission fault to allow the flags
> >  * to be set.
> > +*
> > +* We can improve things if the guest uses AF, as this
> > +* is guaranteed to result in a write to the PTE. For
> > +* DB, however, we'd need to parse the guest's PTs,
> > +* and that's not on. DB is crap anyway.
> >  */
> > switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> 
> Nit: fault_status is calculated once when taking the fault, and passed
> around to all users (like user_mem_abort()). Not sure if this is because
> of the extra cycles needed to get it, or just style. Anyway, maybe it
> applies here.

All these things are just fields in ESR_EL2, which we keep looking at
all the time. The compiler actually does a pretty good job at keeping
that around, specially considering that this function is inlined (at
least here, kvm_handle_guest_abort and kvm_user_mem_abort are merged
into a single monster).

So passing the parameter wouldn't change a thing, and I find the above
more readable (I know that all the information in this function are
derived from the same data structure).

> 
> > case ESR_ELx_FSC_PERM:
> > return true;
> > default:
> > -   return false;
> > +   /* Can't introspect TCR_EL1 with pKVM */
> > +   if (kvm_vm_is_protected(vcpu->kvm))
> > +   return false;
> > +
> > +   mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > +   afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
> > ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > +
> > +   if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > +   return false;
> > +
> > +   return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> 
> Also tested this specific case using page_fault_test when the PT page is
> marked for dirty logging with and without AF. In both cases there's a
> single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
> in the AF case. The RO and UFFD cases also work as expected.

Ah, thanks for checking this.

> 
> Need to send some changes for page_fault_test as many tests assume that
> any S1PTW is always a PT write, and are failing. Also need to add some new
> tests for PTs in RO memslots (as it didn't make much sense before this
> change).

I think this is what I really quite didn't grok in these tests. They
seem to verify the KVM behaviour, which is not what we should check
for.

Instead, we should check for the architectural behaviour, which is
that if HAFDBS is enabled, we can observe updates to the PTs even when
we do not write to them directly.

> 
> > }
> > }
> >  
> > -- 
> > 2.34.1
> > 
> > ___
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 
> Reviewed-by: Ricardo Koller 

Thanks!

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

2022-12-21 Thread Marc Zyngier
On Tue, 20 Dec 2022 21:47:36 +,
Oliver Upton  wrote:
> 
> Hi Marc,
> 
> On Tue, Dec 20, 2022 at 08:09:21PM +, Marc Zyngier wrote:
> > A recent development on the EFI front has resulted in guests having
> > their page tables baked in the firmware binary, and mapped into
> > the IPA space as part as a read-only memslot.
> 
> as part of a
> 
> > Not only this is legitimate, but it also results in added security,
> > so thumbs up. However, this clashes mildly with our handling of a S1PTW
> > as a write to correctly handle AF/DB updates to the S1 PTs, and results
> > in the guest taking an abort it won't recover from (the PTs mapping the
> > vectors will suffer freom the same problem...).
> 
> To be clear, the read-only page tables already have the AF set,
> right?  They certainly must, or else the guest isn't getting far :)

Yes, the guest definitely has the AF set in the PT, and is not trying
to use the HW-assisted AF (which obviously wouldn't work).

>
> I understand you're trying to describe _why_ we promote S1PTW to a
> write, but doing it inline with the context of the EFI issue makes
> it slightly unclear. Could you break these ideas up into two
> paragraphs and maybe spell out the fault conditions a bit more?
> 
>   A recent development on the EFI front has resulted in guests having
>   their page tables baked in the firmware binary, and mapped into the
>   IPA space as part of a read-only memslot. Not only is this legitimate,
>   but it also results in added security, so thumbs up.
> 
>   It is possible to take an S1PTW translation fault if the S1 PTs are
>   unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
>   write to correctly handle hardware AF/DB updates to the S1 PTs.
>   Furthermore, KVM injects a data abort into the guest for S1PTW writes.
>   In the aforementioned case this results in the guest taking an abort
>   it won't recover from, as the S1 PTs mapping the vectors suffer from
>   the same problem.
> 
> Dunno, maybe I stink at reading which is why I got confused in the
> first place.

Nothing wrong with you, just that my write-up is indeed sloppy. I'll
copy paste the above, thanks!

> 
> > So clearly our handling is... wrong.
> > 
> > Instead, switch to a two-pronged approach:
> > 
> > - On S1PTW translation fault, handle the fault as a read
> > 
> > - On S1PTW permission fault, handle the fault as a write
> > 
> > This is of no consequence to SW that *writes* to its PTs (the write
> > will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> > use AF/DB anyway, as that'd be wrong.
> > 
> > Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> > fault on S1PTW permission fault on instruction fetch") do we end-up
> > with two back-to-back faults (page being evicted and faulted back).
> > I don't think this is a case worth optimising for.
> > 
> > Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission 
> > fault on instruction fetch")
> > Signed-off-by: Marc Zyngier 
> > Cc: sta...@vger.kernel.org
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h | 22 --
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index 9bdba47f7e14..fd6ad8b21f85 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
> > kvm_vcpu *vcpu)
> >  
> >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >  {
> > -   if (kvm_vcpu_abt_iss1tw(vcpu))
> > -   return true;
> > +   if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > +   /*
> > +* Only a permission fault on a S1PTW should be
> > +* considered as a write. Otherwise, page tables baked
> > +* in a read-only memslot will result in an exception
> > +* being delivered in the guest.
> 
> Somewhat of a tangent, but:
> 
> Aren't we somewhat unaligned with the KVM UAPI by injecting an
> exception in this case? I know we've been doing it for a while, but it
> flies in the face of the rules outlined in the
> KVM_SET_USER_MEMORY_REGION documentation.

That's an interesting point, and I certainly haven't considered that
for faults introduced by page table walks.

I'm not sure what userspace can do with that though. The problem is
that this is a write for which we don't have useful data: although we
know it is a page-table walker access, we don't know what it w

[PATCH 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*

2022-12-20 Thread Marc Zyngier
The former is an AArch32 legacy, so let's move over to the
verbose (and strictly identical) version.

This involves moving some of the #defines that were private
to KVM into the more generic esr.h.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/esr.h|  9 +
 arch/arm64/include/asm/kvm_arm.h| 15 ---
 arch/arm64/include/asm/kvm_emulate.h| 20 ++--
 arch/arm64/kvm/hyp/include/hyp/fault.h  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/mmu.c| 21 -
 6 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 15b34fbfca66..206de10524e3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -114,6 +114,15 @@
 #define ESR_ELx_FSC_ACCESS (0x08)
 #define ESR_ELx_FSC_FAULT  (0x04)
 #define ESR_ELx_FSC_PERM   (0x0C)
+#define ESR_ELx_FSC_SEA_TTW0   (0x14)
+#define ESR_ELx_FSC_SEA_TTW1   (0x15)
+#define ESR_ELx_FSC_SEA_TTW2   (0x16)
+#define ESR_ELx_FSC_SEA_TTW3   (0x17)
+#define ESR_ELx_FSC_SECC   (0x18)
+#define ESR_ELx_FSC_SECC_TTW0  (0x1c)
+#define ESR_ELx_FSC_SECC_TTW1  (0x1d)
+#define ESR_ELx_FSC_SECC_TTW2  (0x1e)
+#define ESR_ELx_FSC_SECC_TTW3  (0x1f)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT  (24)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 0df3fc3a0173..26b0c97df986 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -319,21 +319,6 @@
 BIT(18) |  \
 GENMASK(16, 15))
 
-/* For compatibility with fault code shared with 32-bit */
-#define FSC_FAULT  ESR_ELx_FSC_FAULT
-#define FSC_ACCESS ESR_ELx_FSC_ACCESS
-#define FSC_PERM   ESR_ELx_FSC_PERM
-#define FSC_SEAESR_ELx_FSC_EXTABT
-#define FSC_SEA_TTW0   (0x14)
-#define FSC_SEA_TTW1   (0x15)
-#define FSC_SEA_TTW2   (0x16)
-#define FSC_SEA_TTW3   (0x17)
-#define FSC_SECC   (0x18)
-#define FSC_SECC_TTW0  (0x1c)
-#define FSC_SECC_TTW1  (0x1d)
-#define FSC_SECC_TTW2  (0x1e)
-#define FSC_SECC_TTW3  (0x1f)
-
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK (~UL(0xf))
 /*
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 4ee467065042..d67a09c07f98 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -349,16 +349,16 @@ static __always_inline u8 
kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *v
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
 {
switch (kvm_vcpu_trap_get_fault(vcpu)) {
-   case FSC_SEA:
-   case FSC_SEA_TTW0:
-   case FSC_SEA_TTW1:
-   case FSC_SEA_TTW2:
-   case FSC_SEA_TTW3:
-   case FSC_SECC:
-   case FSC_SECC_TTW0:
-   case FSC_SECC_TTW1:
-   case FSC_SECC_TTW2:
-   case FSC_SECC_TTW3:
+   case ESR_ELx_FSC_EXTABT:
+   case ESR_ELx_FSC_SEA_TTW0:
+   case ESR_ELx_FSC_SEA_TTW1:
+   case ESR_ELx_FSC_SEA_TTW2:
+   case ESR_ELx_FSC_SEA_TTW3:
+   case ESR_ELx_FSC_SECC:
+   case ESR_ELx_FSC_SECC_TTW0:
+   case ESR_ELx_FSC_SECC_TTW1:
+   case ESR_ELx_FSC_SECC_TTW2:
+   case ESR_ELx_FSC_SECC_TTW3:
return true;
default:
return false;
diff --git a/arch/arm64/kvm/hyp/include/hyp/fault.h 
b/arch/arm64/kvm/hyp/include/hyp/fault.h
index 1b8a2dcd712f..9ddcfe2c3e57 100644
--- a/arch/arm64/kvm/hyp/include/hyp/fault.h
+++ b/arch/arm64/kvm/hyp/include/hyp/fault.h
@@ -60,7 +60,7 @@ static inline bool __get_fault_info(u64 esr, struct 
kvm_vcpu_fault_info *fault)
 */
if (!(esr & ESR_ELx_S1PTW) &&
(cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
-(esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
+(esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM)) {
if (!__translate_far_to_hpfar(far, ))
return false;
} else {
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 3330d1b76bdd..07d37ff88a3f 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -367,7 +367,7 @@ static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, 
u64 *exit_code)
if (static_branch_unlikely(_v2_cpuif_trap)) {
bool valid;
 
-   valid = kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT &&
+   valid = kvm_vcpu_trap_get_fault_type(vcpu) == ESR_ELx_FSC_FAULT 
&&
kvm_vcpu_dabt_isvalid(vcpu) &&
!kvm_vcpu_abt_issea(vcpu) &&
!kvm_vcpu_abt_iss1tw(vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..a3ee3

[PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-20 Thread Marc Zyngier
As a minor optimisation, we can retrofit the "S1PTW is a write
even on translation fault" concept *if* the vcpu is using the
HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
to result in an update of the PTE.

However, we cannot do the same thing for DB, as it would require
us to parse the PTs to find out if the DBM bit is set there.
This is not going to happen.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_emulate.h | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index fd6ad8b21f85..4ee467065042 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
kvm_vcpu *vcpu)
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {
if (kvm_vcpu_abt_iss1tw(vcpu)) {
+   unsigned int afdb;
+   u64 mmfr1;
+
/*
 * Only a permission fault on a S1PTW should be
 * considered as a write. Otherwise, page tables baked
@@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu 
*vcpu)
 * to map the page containing the PT (read only at
 * first), then a permission fault to allow the flags
 * to be set.
+*
+* We can improve things if the guest uses AF, as this
+* is guaranteed to result in a write to the PTE. For
+* DB, however, we'd need to parse the guest's PTs,
+* and that's not on. DB is crap anyway.
 */
switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
case ESR_ELx_FSC_PERM:
return true;
default:
-   return false;
+   /* Can't introspect TCR_EL1 with pKVM */
+   if (kvm_vm_is_protected(vcpu->kvm))
+   return false;
+
+   mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+   afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
+
+   if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
+   return false;
+
+   return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
}
}
 
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots

2022-12-20 Thread Marc Zyngier
Recent developments on the EFI front have resulted in guests that
simply won't boot if the page tables are in a read-only memslot and
that you're a bit unlucky in the way S2 gets paged in... The core
issue is related to the fact that we treat a S1PTW as a write, which
is close enough to what needs to be done. Until to get to RO memslots.

The first patch fixes this and is definitely a stable candidate. It
splits the faulting of page tables in two steps (RO translation fault,
followed by a writable permission fault -- should it even happen).
The second one is a potential optimisation. I'm not even sure it is
worth it. The last patch is totally optional, only tangentially
related, and randomly repainting stuff (maybe that's contagious, who
knows).

The whole thing is on top of Linus' tree as of today. The reason for
this very random choice is that there is a patch in v6.1-rc7 that
hides the problem, and that patch is reverted in rc8 (see commit
0ba09b1733878afe838fe35c310715fda3d46428). I also wanted to avoid
conflicts with kvmarm/next, so here you go.

I've tested the series on A55, M1 and M2. The original issue seems to
trigger best with 16kB pages, so please test with *other* page sizes!

M.

Marc Zyngier (3):
  KVM: arm64: Fix S1PTW handling on RO memslots
  KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*

 arch/arm64/include/asm/esr.h|  9 
 arch/arm64/include/asm/kvm_arm.h| 15 ---
 arch/arm64/include/asm/kvm_emulate.h| 60 -
 arch/arm64/kvm/hyp/include/hyp/fault.h  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/mmu.c| 21 +
 6 files changed, 71 insertions(+), 38 deletions(-)

-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

2022-12-20 Thread Marc Zyngier
A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into
the IPA space as part as a read-only memslot.

Not only this is legitimate, but it also results in added security,
so thumbs up. However, this clashes mildly with our handling of a S1PTW
as a write to correctly handle AF/DB updates to the S1 PTs, and results
in the guest taking an abort it won't recover from (the PTs mapping the
vectors will suffer freom the same problem...).

So clearly our handling is... wrong.

Instead, switch to a two-pronged approach:

- On S1PTW translation fault, handle the fault as a read

- On S1PTW permission fault, handle the fault as a write

This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use AF/DB anyway, as that'd be wrong.

Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.

Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault 
on instruction fetch")
Signed-off-by: Marc Zyngier 
Cc: sta...@vger.kernel.org
---
 arch/arm64/include/asm/kvm_emulate.h | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..fd6ad8b21f85 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
kvm_vcpu *vcpu)
 
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {
-   if (kvm_vcpu_abt_iss1tw(vcpu))
-   return true;
+   if (kvm_vcpu_abt_iss1tw(vcpu)) {
+   /*
+* Only a permission fault on a S1PTW should be
+* considered as a write. Otherwise, page tables baked
+* in a read-only memslot will result in an exception
+* being delivered in the guest.
+*
+* The drawback is that we end-up fauling twice if the
+* guest is using any of HW AF/DB: a translation fault
+* to map the page containing the PT (read only at
+* first), then a permission fault to allow the flags
+* to be set.
+*/
+   switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
+   case ESR_ELx_FSC_PERM:
+   return true;
+   default:
+   return false;
+   }
+   }
 
if (kvm_vcpu_trap_is_iabt(vcpu))
return false;
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Synchronize SMEN on vcpu schedule out

2022-12-20 Thread Marc Zyngier
+ Mark

On Tue, 20 Dec 2022 10:50:24 +,
Dong Bo  wrote:
> 
> From: Nianyao Tang 
> 
> If we have VHE and need to reenable SME for host in
> kvm_arch_vcpu_put_fp, CPACR.SMEN is modified from 0 to 1. Trap
> control for reading SVCR is modified from enable to disable.
> Synchronization is needed before reading SVCR later in
> fpsimd_save, or it may cause sync exception which can not be
> handled by host.
> 
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Alexandru Elisei 
> Cc: Suzuki K Poulose 
> Cc: Oliver Upton 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Signed-off-by: Nianyao Tang 
> ---
>  arch/arm64/kvm/fpsimd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 02dd7e9ebd39..f5799f571317 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -184,6 +184,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
>   sysreg_clear_set(CPACR_EL1,
>CPACR_EL1_SMEN_EL0EN,
>CPACR_EL1_SMEN_EL1EN);
> + isb();
>   }
>  
>   if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
> -- 
> 1.8.3.1
> 
> 

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 1/7] arm64/sysreg: Convert CCSIDR_EL1 to automatic generation

2022-12-19 Thread Marc Zyngier
On Mon, 19 Dec 2022 15:00:15 +,
Mark Brown  wrote:
> 
> [1  ]
> On Sun, Dec 18, 2022 at 01:11:01PM +0000, Marc Zyngier wrote:
> > Akihiko Odaki  wrote:
> 
> > > arch/arm64/tools/gen-sysreg.awk does not allow a hole and requires all
> > > bits are described hence these descriptions. If you have an
> > > alternative idea I'd like to hear.
> 
> > I'd simply suggest creating an UNKNOWN field encompassing bits
> > [21:28]. Alternatively, feel free to try the patch below, which allows
> > you to describe these 4 bits as "Unkn   31:28", similar to Res0/Res1.
> 
> I agree, where practical we should add new field types and other
> features as needed rather than trying to shoehorn things into what the
> tool currently supports.  It is very much a work in progress which can't
> fully represent everything in the spec yet.  For things like the
> registers with multiple possible views it's much more effort which
> shouldn't get in the way of progress on features but with something like
> this just updating the tool so we can match the architecture spec is the
> right thing.

I was tempted to add a Namespace tag that wouldn't generate the sysreg
#defines, but only generate the fields with a feature-specific
namespace. For example:

Sysreg  CCSIDR_EL1  3   1   0   0   0
Res063:32
Unkn31:28
Field   27:13   NumSets
Field   12:3Associativity
Field   2:0 LineSize
EndSysreg

Namespace CCIDX CCSIDR_EL1
Res063:56
Field   55:32   NumSets
Res031:25
Field   24:3Associativity
Field   2:0 LineSize
EndSysreg

the later generating:

#define CCIDR_EL1_CCIDX_RES0(GENMASK(63, 56) | GENMASK(31, 25))
#define CCIDR_EL1_CCIDX_NumSets GENMASK(55, 32)
#define CCIDR_EL1_CCIDX_Associativity   GENMASK(24, 3)
#define CCIDR_EL1_CCIDX_LineSizeGENMASK(2, 0)

Thoughts?

> 
> > Define an 'Unkn' field type modeled after the Res0/Res1 types
> > to allow such description. This allows the generation of
> 
> I'd be tempted to spell out Unknown fully since Unkn is not such a
> common abbreviation but I can see the desire to keep the name shorter
> and it doesn't really matter so either way:
> 
> Reviewed-by: Mark Brown 

Yeah, this stuff is write-only most of the time, and I like my fields
aligned if at all possible.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] arm64/sysreg: Add CCSIDR2_EL1

2022-12-19 Thread Marc Zyngier

On 2022-12-19 14:50, Mark Brown wrote:

On Mon, Dec 19, 2022 at 02:47:25PM +, Marc Zyngier wrote:


Since you're reviewing some of this, please have a look at v3[1],
which outlined a limitation of the sysreg generation tool as well
as a potential fix.


Hrm, would've been nice to be CCed on stuff for the tool :/


Apologies for missing the Cc update. I'll add you to the list
next time.

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] arm64/sysreg: Add CCSIDR2_EL1

2022-12-19 Thread Marc Zyngier

On 2022-12-19 13:12, Mark Brown wrote:

On Sun, Dec 11, 2022 at 02:16:58PM +0900, Akihiko Odaki wrote:

CCSIDR2_EL1 was added with FEAT_CCIDX.


This corresponds to the definition in DDI0487I.a.

Reviewed-by: Mark Brown 


Since you're reviewing some of this, please have a look at v3[1],
which outlined a limitation of the sysreg generation tool as well
as a potential fix.

Thanks,

M.

[1] 
https://lore.kernel.org/r/20221218051412.384657-2-akihiko.od...@daynix.com

--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 5/7] KVM: arm64: Allow user to set CCSIDR_EL1

2022-12-18 Thread Marc Zyngier
On Sun, 18 Dec 2022 05:14:10 +,
Akihiko Odaki  wrote:
> 
> Allow the userspace to set CCSIDR_EL1 so that if the kernel changes the
> default values of CCSIDR_EL1, the userspace can restore the old values
> from an old saved VM context.
> 
> Suggested-by: Marc Zyngier 
> Signed-off-by: Akihiko Odaki 
> ---
>  arch/arm64/include/asm/kvm_host.h |   3 +
>  arch/arm64/kvm/reset.c|   1 +
>  arch/arm64/kvm/sys_regs.c | 116 --
>  3 files changed, 83 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index cc2ede0eaed4..cfc6930efe1b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -417,6 +417,9 @@ struct kvm_vcpu_arch {
>   u64 last_steal;
>   gpa_t base;
>   } steal;
> +
> + /* Per-vcpu CCSIDR override or NULL */
> + u32 *ccsidr;
>  };
>  
>  /*
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 5ae18472205a..7980983dbad7 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -157,6 +157,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
>   if (sve_state)
>   kvm_unshare_hyp(sve_state, sve_state + 
> vcpu_sve_state_size(vcpu));
>   kfree(sve_state);
> + kfree(vcpu->arch.ccsidr);
>  }
>  
>  static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f4a7c5abcbca..f48a3cc38d24 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -87,11 +87,27 @@ static u32 cache_levels;
>  /* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
>  #define CSSELR_MAX 14
>  
> +static u8 get_min_cache_line_size(u32 csselr)
> +{
> + u64 ctr_el0;
> + int field;
> +
> + ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> + field = csselr & CSSELR_EL1_InD ? CTR_EL0_IminLine_SHIFT : 
> CTR_EL0_DminLine_SHIFT;
> +
> + return cpuid_feature_extract_unsigned_field(ctr_el0, field) - 2;
> +}
> +
>  /* Which cache CCSIDR represents depends on CSSELR value. */
> -static u32 get_ccsidr(u32 csselr)
> +static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
>  {
> + u32 ccsidr_index = csselr & (CSSELR_EL1_Level | CSSELR_EL1_InD);
>   u32 ccsidr;
>  
> + if (vcpu->arch.ccsidr && is_valid_cache(ccsidr_index) &&
> + !(kvm_has_mte(vcpu->kvm) && (csselr & CSSELR_EL1_TnD)))
> + return vcpu->arch.ccsidr[ccsidr_index];
> +

I really don't understand this logic. If the requested cache level is
invalid, or the MTE setup doesn't match, you return something that is
the part of the HW hierarchy, despite having a userspace-provided
hierarchy.

The other problem I can see here is that you're still relying on the
host CLIDR_EL1 (aka cache_levels), while restoring a guest cache
hierarchy must include a CLIDR_EL1. Otherwise, you cannot really
evaluate the validity of that hierarchy, nor return consistent
results.

I was expecting something like (totally untested, but you'll get what
I mean):

if (vcpu->arch.cssidr) {
if (!is_valid_cache(vcpu, csselr))
return 0; // UNKNOWN value

return vcpu->arch.ccsidr[ccsidr_index];
}

and with is_valid_cache() written as:

bool is_valid_cache(struct kvm_vcpu *vcpu, u64 csselr)
{
u64 clidr = __vcpu_sys_reg(vcpu, CLIDR_EL1);
u64 idx = FIELD_GET(CSSELR_EL1_Level, csselr);
u64 ttype = FIELD_GET(GENMASK(CLIDR_EL1_Ttypen_SHIFT + idx * 2 + 1,
  CLIDR_EL1_Ttypen_SHIFT + idx * 2),
  clidr);
u64 ctype = FIELD_GET(CLIDR_EL1_Ctype1 << (idx * 3), clidr);

// !MTE or InD make TnD RES0
if (!kvm_has_mte(vcpu->kvm) || (csselr & CSSELR_EL1_InD))
csselr &= ~CSSELR_EL1_TnD;

// If TnD is set, the cache level must be purely for tags
if (csselr & CSSELR_EL1_TnD)
return (ttype == 0b01);

// Otherwise, check for a match against the InD value
switch (ctype) {
case 0: /* No cache */
return false;
case 1: /* Instruction cache only */
return (csselr & CSSELR_EL1_InD);
case 2: /* Data cache only */
case 4: /* Unified cache */
return !(csselr & CSSELR_EL1_InD);
case 3: /* Separate instruction and data caches */
return true;
default: /* Reserved: we can't know instruction or data. */
return false;
}
}

which implies that CLIDR_EL1 isn't an invariant anymore. You have

Re: [PATCH v3 1/7] arm64/sysreg: Convert CCSIDR_EL1 to automatic generation

2022-12-18 Thread Marc Zyngier
On Sun, 18 Dec 2022 11:35:12 +,
Akihiko Odaki  wrote:
> 
> On 2022/12/18 20:23, Marc Zyngier wrote:
> > On Sun, 18 Dec 2022 05:14:06 +,
> > Akihiko Odaki  wrote:
> >> 
> >> Convert CCSIDR_EL1 to automatic generation as per DDI0487I.a. The field
> >> definition is for case when FEAT_CCIDX is not implemented. Fields WT,
> >> WB, RA and WA are defined as per A.j since they are now reserved and
> >> may have UNKNOWN values in I.a, which the file format cannot represent.
> >> 
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >>   arch/arm64/include/asm/sysreg.h |  1 -
> >>   arch/arm64/tools/sysreg | 11 +++
> >>   2 files changed, 11 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/arm64/include/asm/sysreg.h 
> >> b/arch/arm64/include/asm/sysreg.h
> >> index 7d301700d1a9..910e960661d3 100644
> >> --- a/arch/arm64/include/asm/sysreg.h
> >> +++ b/arch/arm64/include/asm/sysreg.h
> >> @@ -425,7 +425,6 @@
> >> #define SYS_CNTKCTL_EL1sys_reg(3, 0, 14, 1,
> >> 0)
> >>   -#define SYS_CCSIDR_EL1  sys_reg(3, 1, 0, 0, 0)
> >>   #define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
> >> #define SYS_RNDR_EL0   sys_reg(3, 3, 2, 4, 0)
> >> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> >> index 384757a7eda9..acc79b5ccf92 100644
> >> --- a/arch/arm64/tools/sysreg
> >> +++ b/arch/arm64/tools/sysreg
> >> @@ -871,6 +871,17 @@ SysregSCXTNUM_EL1 3   0   13  
> >> 0   7
> >>   Field63:0SoftwareContextNumber
> >>   EndSysreg
> >>   +Sysreg  CCSIDR_EL1  3   1   0   0   0
> >> +Res0  63:32
> >> +Field 31:31   WT
> >> +Field 30:30   WB
> >> +Field 29:29   RA
> >> +Field 28:28   WA
> > 
> > For fields described as a single bit, the tool supports simply
> > indicating the bit number (28 rather than 28:28).
> > 
> > However, I strongly recommend against describing fields that have been
> > dropped from the architecture.  This only happens when these fields
> > are never used by any implementation, so describing them is at best
> > useless.
> 
> arch/arm64/tools/gen-sysreg.awk does not allow a hole and requires all
> bits are described hence these descriptions. If you have an
> alternative idea I'd like to hear.

I'd simply suggest creating an UNKNOWN field encompassing bits
[21:28]. Alternatively, feel free to try the patch below, which allows
you to describe these 4 bits as "Unkn   31:28", similar to Res0/Res1.

>
> > 
> >> +Field 27:13   NumSets
> >> +Field 12:3Associavity

Also, you may want to fix the typo here (Associativity).

Thanks,

M.

>From 3112be25ec785de4c92d11d5964d54f216a2289c Mon Sep 17 00:00:00 2001
From: Marc Zyngier 
Date: Sun, 18 Dec 2022 12:55:23 +
Subject: [PATCH] arm64: Allow the definition of UNKNOWN system register fields

The CCSIDR_EL1 register contains an UNKNOWN field (which replaces
fields that were actually defined in previous revisions of the
architecture).

Define an 'Unkn' field type modeled after the Res0/Res1 types
to allow such description. This allows the generation of

  #define CCSIDR_EL1_UNKN (UL(0) | GENMASK_ULL(31, 28))

which may have its use one day. Hopefully the architecture doesn't
add too many of those in the future.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/tools/gen-sysreg.awk | 20 +++-
 arch/arm64/tools/sysreg |  2 ++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/tools/gen-sysreg.awk b/arch/arm64/tools/gen-sysreg.awk
index c350164a3955..e1df4b956596 100755
--- a/arch/arm64/tools/gen-sysreg.awk
+++ b/arch/arm64/tools/gen-sysreg.awk
@@ -98,6 +98,7 @@ END {
 
res0 = "UL(0)"
res1 = "UL(0)"
+   unkn = "UL(0)"
 
next_bit = 63
 
@@ -112,11 +113,13 @@ END {
 
define(reg "_RES0", "(" res0 ")")
define(reg "_RES1", "(" res1 ")")
+   define(reg "_UNKN", "(" unkn ")")
print ""
 
reg = null
res0 = null
res1 = null
+   unkn = null
 
next
 }
@@ -134,6 +137,7 @@ END {
 
res0 = "UL(0)"
res1 = "UL(0)"
+   unkn = "UL(0)"
 
define("REG_" reg, "S" op0 "_" op1 "_C" crn "_C" crm "_" op2)
define("SYS_" reg, "sys_reg(" op

Re: [PATCH v3 1/7] arm64/sysreg: Convert CCSIDR_EL1 to automatic generation

2022-12-18 Thread Marc Zyngier
On Sun, 18 Dec 2022 05:14:06 +,
Akihiko Odaki  wrote:
> 
> Convert CCSIDR_EL1 to automatic generation as per DDI0487I.a. The field
> definition is for case when FEAT_CCIDX is not implemented. Fields WT,
> WB, RA and WA are defined as per A.j since they are now reserved and
> may have UNKNOWN values in I.a, which the file format cannot represent.
>
> Signed-off-by: Akihiko Odaki 
> ---
>  arch/arm64/include/asm/sysreg.h |  1 -
>  arch/arm64/tools/sysreg | 11 +++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 7d301700d1a9..910e960661d3 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -425,7 +425,6 @@
>  
>  #define SYS_CNTKCTL_EL1  sys_reg(3, 0, 14, 1, 0)
>  
> -#define SYS_CCSIDR_EL1   sys_reg(3, 1, 0, 0, 0)
>  #define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
>  
>  #define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0)
> diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
> index 384757a7eda9..acc79b5ccf92 100644
> --- a/arch/arm64/tools/sysreg
> +++ b/arch/arm64/tools/sysreg
> @@ -871,6 +871,17 @@ Sysreg   SCXTNUM_EL1 3   0   13  0   
> 7
>  Field63:0SoftwareContextNumber
>  EndSysreg
>  
> +Sysreg   CCSIDR_EL1  3   1   0   0   0
> +Res0 63:32
> +Field31:31   WT
> +Field30:30   WB
> +Field29:29   RA
> +Field28:28   WA

For fields described as a single bit, the tool supports simply
indicating the bit number (28 rather than 28:28).

However, I strongly recommend against describing fields that have been
dropped from the architecture.  This only happens when these fields
are never used by any implementation, so describing them is at best
useless.

> +Field27:13   NumSets
> +Field12:3Associavity
> +Field2:0 LineSize
> +EndSysreg
> +

I don't think we have a good solution for overlapping fields that
depend on other factors, either contextual (such as a mode that
changes the layout of a sysreg), or architecture warts such as
FEAT_CCIDX (which changes the layout of a well-known sysreg).

At least, put a comment here that indicates the context of the
description.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2

2022-12-15 Thread Marc Zyngier
On Thu, 15 Dec 2022 00:52:28 +,
Oliver Upton  wrote:
> 
> On Tue, Dec 06, 2022 at 01:59:18PM +, Ryan Roberts wrote:
> > (appologies, I'm resending this series as I managed to send the cover 
> > letter to
> > all but the following patches only to myself on first attempt).
> > 
> > This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
> > Support 52-bit Output Addresses: FEAT_LPA2 changes the format of
> > the PTEs. The HW advertises support for LPA2 independently for
> > stage 1 and stage 2, and therefore its possible to have it for one
> > and not the other. I've assumed that there is a valid case for
> > this if stage 1 is not supported but stage 2 is, KVM could still
> > then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > then be consumed by a 64KB page guest kernel with the help of
> > FEAT_LPA). Because of this independence and the fact that the kvm
> > pgtable library is used for both stage 1 and stage 2 tables, this
> > means the library now has to remember the in-use format on a
> > per-page-table basis. To do this, I had to rework some functions
> > to take a `struct kvm_pgtable *` parameter, and as a result, there
> > is a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.
> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?

My recollection is hazy, but LPA came first, and LVA only landed much
later (because the two features were made independent in the
architecture, something that was later abandoned for LPA2, which
implies large VAs as well).

So yes, the VMM can place memory wherever it wants in the 52bit IPA
space, even if its own VA space is limited to 48 bits. And it doesn't
have to be memory, by the way. You could place all the emulated MMIO
above the 48bit limit, for example, and that doesn't require any trick
other than the HW supporting 52bit PAs, and VTCR_EL2 being correctly
configured.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] KVM: arm64: Normalize cache configuration

2022-12-14 Thread Marc Zyngier
On Sun, 11 Dec 2022 05:17:00 +,
Akihiko Odaki  wrote:
> 
> Before this change, the cache configuration of the physical CPU was
> exposed to vcpus. This is problematic because the cache configuration a
> vcpu sees varies when it migrates between vcpus with different cache
> configurations.
> 
> Fabricate cache configuration from arm64_ftr_reg_ctrel0.sys_val, which

s/arm64_ftr_reg_ctrel0.sys_val/the sanitised value/

> holds the CTR_EL0 value the userspace sees regardless of which physical
> CPU it resides on.
> 
> HCR_TID2 is now always set as it is troublesome to detect the difference
> of cache configurations among physical CPUs.
> 
> CSSELR_EL1 is now held in the memory instead of the corresponding
> phyisccal register as the fabricated cache configuration may have a

nit: physical

> cache level which does not exist in the physical CPU, and setting the
> physical CSSELR_EL1 for the level results in an UNKNOWN behavior.

Not quite UNKNOWN behaviour. You could get an UNKNOWN value when
reading CCSIDR_EL1, or an UNDEF (or even the instruction executed as a
NOP). But CSSELR_EL1 doesn't have any restriction other than returning
an UNKNOWN value if you write crap to it and try to read it back.

The different is subtle, but important: an UNKNOWN behaviour could
result in the machine catching fire, for example, and we don't really
want that... ;-)

> 
> CLIDR_EL1 and CCSIDR_EL1 are now writable from the userspace so that
> the VMM can restore the values saved with the old kernel.
> 
> Suggested-by: Marc Zyngier 
> Signed-off-by: Akihiko Odaki 
> ---
>  arch/arm64/include/asm/kvm_arm.h   |   3 +-
>  arch/arm64/include/asm/kvm_emulate.h   |   4 -
>  arch/arm64/include/asm/kvm_host.h  |   6 +-
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   2 -
>  arch/arm64/kvm/reset.c |   1 +
>  arch/arm64/kvm/sys_regs.c  | 232 -
>  6 files changed, 142 insertions(+), 106 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 8aa8492dafc0..44be46c280c1 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -81,11 +81,12 @@
>   * SWIO: Turn set/way invalidates into set/way clean+invalidate
>   * PTW:  Take a stage2 fault if a stage1 walk steps in device 
> memory
>   * TID3: Trap EL1 reads of group 3 ID registers
> + * TID2: Trap CTR_EL0, CCSIDR2_EL1, CLIDR_EL1, and CSSELR_EL1
>   */
>  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
>HCR_BSU_IS | HCR_FB | HCR_TACR | \
>HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
> -  HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 )
> +  HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 | HCR_TID2)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
>  #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
>  #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 9bdba47f7e14..30c4598d643b 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -88,10 +88,6 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>   if (vcpu_el1_is_32bit(vcpu))
>   vcpu->arch.hcr_el2 &= ~HCR_RW;
>  
> - if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> - vcpu_el1_is_32bit(vcpu))
> - vcpu->arch.hcr_el2 |= HCR_TID2;
> -
>   if (kvm_has_mte(vcpu->kvm))
>   vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 45e2136322ba..27abf81c6910 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -178,6 +178,7 @@ struct kvm_vcpu_fault_info {
>  enum vcpu_sysreg {
>   __INVALID_SYSREG__,   /* 0 is reserved as an invalid value */
>   MPIDR_EL1,  /* MultiProcessor Affinity Register */
> + CLIDR_EL1,  /* Cache Level ID Register */
>   CSSELR_EL1, /* Cache Size Selection Register */
>   SCTLR_EL1,  /* System Control Register */
>   ACTLR_EL1,  /* Auxiliary Control Register */
> @@ -417,6 +418,9 @@ struct kvm_vcpu_arch {
>   u64 last_steal;
>   gpa_t base;
>   } steal;
> +
> + /* Per-vcpu CCSIDR override or NULL */
> + u32 *ccsidr;
>  };
>  
>  /*
> @@ -621,7 +625,6 @@ static inline bool __vcpu_read_sys_reg_from_cpu(int reg, 
> u64 *val)
>   return false;
>  
>   

Re: [PATCH 06/14] KVM: selftests: Rename UNAME_M to ARCH_DIR, fill explicitly for x86

2022-12-14 Thread Marc Zyngier

On 2022-12-13 20:03, Sean Christopherson wrote:

One last thought/question, what do y'all think about renaming 
directories to
follow the kernel proper?  I.e. aarch64=>arm64, s390x=>s390, and 
x86_64=>x86.
Then $(ARCH_DIR) would go away.  The churn would be unfortunate, but it 
would be

nice to align with arch/ and tools/arch/.


aarch64->arm64 makes sense to me. Whether it is worth the churn
is another question. As long as we don't try to backport tests,
the damage should be limited to a single merge window.

  M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/1] KVM: arm64: PMU: Fix PMCR_EL0 reset value

2022-12-12 Thread Marc Zyngier
On Fri, 9 Dec 2022 16:44:45 +, James Clark wrote:
> We noticed qemu failing to run because of an assert on our CI. I don't see 
> the issue anymore with
> this fix.
> 
> Applies to kvmarm/next (753d734f3f34)
> 
> Thanks
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: PMU: Fix PMCR_EL0 reset value
  commit: aff234839f8b80ac101e6c2f14d0e44b236efa48

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH 1/3] arm: pmu: Fix overflow checks for PMUv3p5 long counters

2022-12-12 Thread Marc Zyngier
Alex,

On Sun, 11 Dec 2022 11:40:39 +,
Alexandru Elisei  wrote:
> 
> A simple "hey, you're wrong here, the PMU extensions do not follow the
> principles of the ID scheme for fields in ID registers" would have
> sufficed.

This is what I did, and saved you the hassle of looking it up.

> Guess you never made a silly mistake ever, right?

It's not so much about making a silly mistake. I do that all the time.
But it is about the way you state these things, and the weight that
your reviews carry. You're a trusted reviewer, with a lot of
experience, and posting with an @arm.com address: what you say in a
public forum sticks. When you assert that the author is wrong, they
will take it at face value.

> Otherwise, good job encouraging people to help review KVM/arm64 patches ;)

What is the worse: no review? or a review that spreads confusion?
Think about it. I'm all for being nice, but I will call bullshit when
I see it asserted by people with a certain level of authority.

And I've long made up my mind about the state of the KVM/arm64 review
process -- reviews rarely come from people who have volunteered to do
so, but instead from those who have either a vested interest in it, or
an ulterior motive. Hey ho...

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Handle CCSIDR associativity mismatches

2022-12-11 Thread Marc Zyngier
On Sun, 11 Dec 2022 05:25:31 +,
Akihiko Odaki  wrote:
> 
> On 2022/12/04 23:57, Marc Zyngier wrote:
> > On Fri, 02 Dec 2022 09:55:24 +,
> > Akihiko Odaki  wrote:
> >> 
> >> On 2022/12/02 18:40, Marc Zyngier wrote:
> >>> On Fri, 02 Dec 2022 05:17:12 +,
> >>> Akihiko Odaki  wrote:
> >>>> 
> >>>>>> On M2 MacBook Air, I have seen no other difference in standard ID
> >>>>>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> >>>>>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> >>>>>> that without more analysis. This is still enough to migrate vCPU
> >>>>>> running Linux at least.
> >>>>> 
> >>>>> I guess that MacOS hides more of the underlying HW than KVM does. And
> >>>>> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
> >>>>> different between the two clusters.
> >>>> 
> >>>> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
> >>>> value for ioctls while it exposes the MIDR value each physical CPU
> >>>> owns to vCPU.
> >>> 
> >>> This only affects the VMM though, and not the guest which sees the
> >>> MIDR of the CPU it runs on. The problem is that at short of pinning
> >>> the vcpus, you don't know where they will run. So any value is fair
> >>> game.
> >> 
> >> Yes, my concern is that VMM can be confused if it sees something
> >> different from what the guest on the vCPU sees.
> > 
> > Well, this has been part of the ABI for about 10 years, since Rusty
> > introduced this notion of invariant, so userspace is already working
> > around it if that's an actual issue.
> 
> In that case, I think it is better to document that the interface is
> not working properly and deprecated.

This means nothing. Deprecating an API doesn't mean we don't support
it and doesn't solve any issue for existing userspace.

I'd rather not change anything, TBH. Existing userspace already knows
how to deal with this,

> 
> > 
> > This would be easily addressed though, and shouldn't result in any
> > issue. The following should do the trick (only lightly tested on an
> > M1).
> 
> This can be problematic when restoring vcpu state saved with the old
> kernel. A possible solution is to allow the userspace to overwrite
> MIDR_EL1 as proposed for CCSIDR_EL1.

That would break most guests for obvious reasons. At best what can be
done is to make the MIDR WI.

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/1] KVM: arm64: PMU: Fix PMCR_EL0 reset value

2022-12-10 Thread Marc Zyngier
On Fri, 09 Dec 2022 17:58:31 +,
Oliver Upton  wrote:
> 
> On Fri, Dec 09, 2022 at 04:44:46PM +, James Clark wrote:
> > ARMV8_PMU_PMCR_N_MASK is an unshifted value which results in the wrong
> > reset value for PMCR_EL0, so shift it to fix it.
> 
> That's just mean. *_MASK tends to be a shifted mask, although it would
> appear that asm/perf_event.h does not follow this convention. Fixing
> that would be nice (as I'm sure somebody else will get burned by this),
> but for the sake of an immediate fix:

Well, that'll teach me the usual lesson: last minute changes without
full non-regression testing are bound to end in disaster.

> 
> > This fixes the following error when running qemu:
> > 
> >   $ qemu-system-aarch64 -cpu host -machine type=virt,accel=kvm -kernel ...
> > 
> >   target/arm/helper.c:1813: pmevcntr_rawwrite: Assertion `counter < 
> > pmu_num_counters(env)' failed.
> > 
> > Fixes: 292e8f149476 ("KVM: arm64: PMU: Simplify PMCR_EL0 reset handling")
> > Signed-off-by: James Clark 
> 
> Reviewed-by: Oliver Upton 

Thanks both. I'll queue that ASAP as a fix.

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH 1/3] arm: pmu: Fix overflow checks for PMUv3p5 long counters

2022-12-10 Thread Marc Zyngier
On Fri, 09 Dec 2022 17:47:14 +,
Alexandru Elisei  wrote:
> 
> Hi,
> 
> On Fri, Dec 02, 2022 at 04:55:25AM +, Ricardo Koller wrote:
> > PMUv3p5 uses 64-bit counters irrespective of whether the PMU is configured
> > for overflowing at 32 or 64-bits. The consequence is that tests that check
> > the counter values after overflowing should not assume that values will be
> > wrapped around 32-bits: they overflow into the other half of the 64-bit
> > counters on PMUv3p5.
> > 
> > Fix tests by correctly checking overflowing-counters against the expected
> > 64-bit value.
> > 
> > Signed-off-by: Ricardo Koller 
> > ---
> >  arm/pmu.c | 29 ++---
> >  1 file changed, 18 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arm/pmu.c b/arm/pmu.c
> > index cd47b14..eeac984 100644
> > --- a/arm/pmu.c
> > +++ b/arm/pmu.c
> > @@ -54,10 +54,10 @@
> >  #define EXT_COMMON_EVENTS_LOW  0x4000
> >  #define EXT_COMMON_EVENTS_HIGH 0x403F
> >  
> > -#define ALL_SET0x
> > -#define ALL_CLEAR  0x0
> > -#define PRE_OVERFLOW   0xFFF0
> > -#define PRE_OVERFLOW2  0xFFDC
> > +#define ALL_SET0xULL
> > +#define ALL_CLEAR  0xULL
> > +#define PRE_OVERFLOW   0xFFF0ULL
> > +#define PRE_OVERFLOW2  0xFFDCULL
> >  
> >  #define PMU_PPI23
> >  
> > @@ -538,6 +538,7 @@ static void test_mem_access(void)
> >  static void test_sw_incr(void)
> >  {
> > uint32_t events[] = {SW_INCR, SW_INCR};
> > +   uint64_t cntr0;
> > int i;
> >  
> > if (!satisfy_prerequisites(events, ARRAY_SIZE(events)))
> > @@ -572,9 +573,9 @@ static void test_sw_incr(void)
> > write_sysreg(0x3, pmswinc_el0);
> >  
> > isb();
> > -   report(read_regn_el0(pmevcntr, 0)  == 84, "counter #1 after + 100 
> > SW_INCR");
> > -   report(read_regn_el0(pmevcntr, 1)  == 100,
> > -   "counter #0 after + 100 SW_INCR");
> > +   cntr0 = (pmu.version < ID_DFR0_PMU_V3_8_5) ? 84 : PRE_OVERFLOW + 100;
> 
> Hm... in the Arm ARM it says that counters are 64-bit if PMUv3p5 is
> implemented.  But it doesn't say anywhere that versions newer than p5 are
> required to implement PMUv3p5.

And I don't think it needs to say it, because there is otherwise no
way for SW to discover whether 64bit counters are implemented or not.

> 
> For example, for PMUv3p7, it says that the feature is mandatory in Arm8.7
> implementations. My interpretation of that is that it is not forbidden for
> an implementer to cherry-pick this version on older versions of the
> architecture where PMUv3p5 is not implemented.

I'm sorry to have to say that, but I find your suggestion that PMUv3p7
could be implemented without supporting the full gamut of PMUv3p5
ludicrous.

Please look back at the ARM ARM, specially at the tiny section titled
"Alternative ID scheme used for the Performance Monitors Extension
version" (DDI0487I.a, D17.1.3, page 5553), and the snipped of C code
that performs exactly this check:


  if (value != 0xF and value >= number) {
// do something that relies on version 'number' of the feature
  }


Replace 'value' with 7 (PMUv3p7), 'number' with 6 (PMUv3p5), and you
get the exact property that you pretend doesn't exist, allowing you to
rely on PMUv3p5 to be implemented when the HW has PMUv3p7.

> Maybe the check should be pmu.version == ID_DFR0_PMU_V3_8_5, to match the
> counter definitions in the architecture?

No, that'd be totally wrong. You need to check your understanding of
how the ID registers work.

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [GIT PULL] KVM/arm64 updates for 6.2

2022-12-06 Thread Marc Zyngier
On Tue, 06 Dec 2022 21:43:43 +,
Paolo Bonzini  wrote:
> 
> On 12/6/22 19:20, Mark Brown wrote:
> >> I almost suggested doing that on multiple occasions this cycle, but 
> >> ultimately
> >> decided not to because it would effectively mean splitting series that 
> >> touch KVM
> >> and selftests into different trees, which would create a different kind of
> >> dependency hell.  Or maybe a hybrid approach where series that only (or 
> >> mostly?)
> >> touch selftests go into a dedicated tree?
> > 
> > Some other subsystems do have a separate branch for kselftests.  One
> > fairly common occurrence is that the selftests branch ends up failing to
> > build independently because someone adds new ABI together with a
> > selftest but the patches adding the ABI don't end up on the same branch
> > as the tests which try to use them.  That is of course resolvable but
> > it's a common friction point.
> 
> Yeah, the right solution is simply to merge selftests changes
> separately from the rest and use topic branches.

Don't know if this is what you have in mind, but I think that we
should use topic branches for *everything*. The only things for which
I don't use a separate branch are the odd drive-by patches, of the
spelling fix persuasion.

That's what we do for arm64 and the IRQ subsystem. It is a bit more
involved at queuing time, but makes dropping series from -next
extremely easy, without affecting the history. And crucially, it gives
everyone a hint to base their stuff on a stable commit, not a random
"tip of kvm/queue as of three days ago".

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [GIT PULL] KVM/arm64 updates for 6.2

2022-12-06 Thread Marc Zyngier
On Tue, 06 Dec 2022 17:41:21 +,
Paolo Bonzini  wrote:
> 
> On 12/5/22 16:58, Marc Zyngier wrote:
> > - There is a lot of selftest conflicts with your own branch, see:
> > 
> >https://lore.kernel.org/r/20221201112432.4cb9a...@canb.auug.org.au
> >https://lore.kernel.org/r/20221201113626.438f1...@canb.auug.org.au
> >https://lore.kernel.org/r/20221201115741.7de32...@canb.auug.org.au
> >https://lore.kernel.org/r/20221201120939.3c19f...@canb.auug.org.au
> >https://lore.kernel.org/r/20221201131623.18ebc...@canb.auug.org.au
> > 
> >for a rather exhaustive collection.
> 
> Yeah, I saw them in Stephen's messages but missed your reply.
> 
> In retrospect, at least Gavin's series for memslot_perf_test should have
> been applied by both of us with a topic branch, but there's so many
> conflicts all over the place that it's hard to single out one series.
> It just happens.

I generally queue things on topic branches for my own sanity, happy to
make them available in the future.

> 
> The only conflict in non-x86 code is the following one, please check
> if I got it right.
> 
> diff --git a/tools/testing/selftests/kvm/aarch64/page_fault_test.c 
> b/tools/testing/selftests/kvm/aarch64/page_fault_test.c
> index 05bb6a6369c2..0cda70bef5d5 100644
> --- a/tools/testing/selftests/kvm/aarch64/page_fault_test.c
> +++ b/tools/testing/selftests/kvm/aarch64/page_fault_test.c
> @@ -609,6 +609,8 @@ static void setup_memslots(struct kvm_vm *vm, struct 
> test_params *p)
>   data_size / guest_page_size,
>   p->test_desc->data_memslot_flags);
>   vm->memslots[MEM_REGION_TEST_DATA] = TEST_DATA_MEMSLOT;
> +
> + ucall_init(vm, data_gpa + data_size);
>  }
>   static void setup_default_handlers(struct test_desc *test)
> @@ -704,8 +706,6 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   setup_gva_maps(vm);
>  -ucall_init(vm, NULL);
> -
>   reset_event_counts();
>   /*
> 
> 
> Special care is needed here because the test uses vm_create().
> 
> I haven't pushed to kvm/next yet to give you time to check, so the
> merge is currently in kvm/queue only.

There has been a couple of -next failures reported by broonie:

https://lore.kernel.org/r/20221206175916.250104-1-broo...@kernel.org
https://lore.kernel.org/r/20221206181506.252537-1-broo...@kernel.org

which I think you've received as well. The second patch is definitely
needed, but you've already solved the first one. At least things do
build.

> 
> > - For the 6.3 cycle, we are going to experiment with Oliver taking
> >care of most of the patch herding. I'm sure he'll do a great job,
> >but if there is the odd mistake, please cut him some slack and blame
> >me instead.
> 
> Absolutely - you both have all the slack you need, synchronization
> is harder than it seems.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2

2022-12-06 Thread Marc Zyngier
On Tue, 06 Dec 2022 12:06:00 +,
Ryan Roberts  wrote:
> 
> This is my first upstream feature submission so please go easy ;-)

Sure! Where are the patches? ;-)

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[GIT PULL] KVM/arm64 updates for 6.2

2022-12-05 Thread Marc Zyngier
EL1
  arm64/sysreg: Standardise naming for ID_ISAR6_EL1
  arm64/sysreg: Standardise naming for ID_PFR0_EL1
  arm64/sysreg: Standardise naming for ID_PFR1_EL1
  arm64/sysreg: Standardise naming for ID_PFR2_EL1
  arm64/sysreg: Standardise naming for ID_DFR0_EL1
  arm64/sysreg: Standardise naming for ID_DFR1_EL1
  arm64/sysreg: Standardise naming for MVFR0_EL1
  arm64/sysreg: Standardise naming for MVFR1_EL1
  arm64/sysreg: Standardise naming for MVFR2_EL1
  arm64/sysreg: Extend the maximum width of a register and symbol name
  arm64/sysreg: Convert ID_MMFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR3_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR4_EL1 to automatic generation
  arm64/sysreg: Convert ID_ISAR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_ISAR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_ISAR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_ISAR3_EL1 to automatic generation
  arm64/sysreg: Convert ID_ISAR4_EL1 to automatic generation
  arm64/sysreg: Convert ID_ISAR5_EL1 to automatic generation
  arm64/sysreg: Convert ID_ISAR6_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
  arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
  arm64/sysreg: Convert MVFR0_EL1 to automatic generation
  arm64/sysreg: Convert MVFR1_EL1 to automatic generation
  arm64/sysreg: Convert MVFR2_EL1 to automatic generation
  arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
  arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
  arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation

Marc Zyngier (32):
  Merge tag 'kvmarm-fixes-6.1-3' into kvm-arm64/dirty-ring
  arm64: Add ID_DFR0_EL1.PerfMon values for PMUv3p7 and IMP_DEF
  KVM: arm64: PMU: Align chained counter implementation with architecture 
pseudocode
  KVM: arm64: PMU: Always advertise the CHAIN event
  KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
  KVM: arm64: PMU: Narrow the overflow checking when required
  KVM: arm64: PMU: Only narrow counters that are not 64bit wide
  KVM: arm64: PMU: Add counter_index_to_*reg() helpers
  KVM: arm64: PMU: Simplify setting a counter to a specific value
  KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits
  KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation
  KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace
  KVM: arm64: PMU: Allow ID_DFR0_EL1.PerfMon to be set from userspace
  KVM: arm64: PMU: Implement PMUv3p5 long counter support
  KVM: arm64: PMU: Allow PMUv3p5 to be exposed to the guest
  KVM: arm64: PMU: Simplify vcpu computation on perf overflow notification
  KVM: arm64: PMU: Make kvm_pmc the main data structure
  KVM: arm64: PMU: Simplify PMCR_EL0 reset handling
  KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu run
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit 
overflow
  Merge branch kvm-arm64/selftest/memslot-fixes into kvmarm-master/next
  Merge branch kvm-arm64/selftest/linked-bps into kvmarm-master/next
  Merge branch kvm-arm64/selftest/s2-faults into kvmarm-master/next
  Merge branch kvm-arm64/selftest/access-tracking into kvmarm-master/next
  Merge branch kvm-arm64/52bit-fixes into kvmarm-master/next
  Merge branch kvm-arm64/dirty-ring into kvmarm-master/next
  Merge branch kvm-arm64/parallel-faults into kvmarm-master/next
  Merge branch kvm-arm64/pkvm-vcpu-state into kvmarm-master/next
  Merge branch kvm-arm64/mte-map-shared into kvmarm-master/next
  Merge branch kvm-arm64/pmu-unchained into kvmarm-master/next
  Merge branch kvm-arm64/misc-6.2 into kvmarm-master/next
  Merge remote-tracking branch 'arm64/for-next/sysregs' into 
kvmarm-master/next

Oliver Upton (19):
  KVM: arm64: Combine visitor arguments into a context structure
  KVM: arm64: Stash observed pte value in visitor context
  KVM: arm64: Pass mm_ops through the visitor context
  KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  KVM: arm64: Use an opaque type for pteps
  KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  KVM: arm64: Protect stage-2 traversal with RCU
  KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  KVM: arm64: Split init and set for table PTE
  KVM: arm64: Make block->table PTE changes parallel-aware
  KVM: arm64: Make leaf->leaf PTE changes

Re: [PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.

2022-12-05 Thread Marc Zyngier
On Mon, 5 Dec 2022 11:40:31 +, Ryan Roberts wrote:
> get_user_mapping_size() uses kvm's pgtable library to walk a user space
> page table created by the kernel, and in doing so, fakes up the metadata
> that the library needs, including ia_bits, which defines the size of the
> input address.
> 
> For the case where the kernel is compiled for 52 VA bits but runs on HW
> that does not support LVA, it will fall back to 48 VA bits at runtime.
> Therefore we must use vabits_actual rather than VA_BITS to get the true
> address size.
> 
> [...]

Applied to next, thanks!

[1/1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.
  commit: 219072c09abde0f1d0a6ce091be375e8eb7d08f0

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.

2022-12-05 Thread Marc Zyngier
Hi Ryan,

Thanks for that.

On Mon, 05 Dec 2022 11:40:31 +,
Ryan Roberts  wrote:
> 
> get_user_mapping_size() uses kvm's pgtable library to walk a user space
> page table created by the kernel, and in doing so, fakes up the metadata
> that the library needs, including ia_bits, which defines the size of the
> input address.

It isn't supposed to "fake" anything. It simply provides the
information that the walker needs to correctly parse the page tables.

> 
> For the case where the kernel is compiled for 52 VA bits but runs on HW
> that does not support LVA, it will fall back to 48 VA bits at runtime.
> Therefore we must use vabits_actual rather than VA_BITS to get the true
> address size.
> 
> This is benign in the current code base because the pgtable library only
> uses it for error checking.
> 
> Fixes: 6011cf68c885 ("KVM: arm64: Walk userspace page tables to compute
> the THP mapping size")

nit: this should appear on a single line, without a line-break in the
middle [1]...

>

... without a blank line between Fixes: and the rest of the tags.

And while I'm on the "trivial remarks" train, drop the full stop at
the end of the subject line.

> Signed-off-by: Ryan Roberts 
> ---
>  arch/arm64/kvm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 4efb983cff43..1ef0704420d9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -641,7 +641,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 
> addr)
>  {
>   struct kvm_pgtable pgt = {
>   .pgd= (kvm_pte_t *)kvm->mm->pgd,
> - .ia_bits= VA_BITS,
> + .ia_bits= vabits_actual,
>   .start_level= (KVM_PGTABLE_MAX_LEVELS -
>  CONFIG_PGTABLE_LEVELS),
>   .mm_ops = _user_mm_ops,
> --
> 2.25.1
> 
> 

Other than the above nits, this is well spotted. I need to regenerate
the kvmarm/next branch after the sysreg attack from James, so I'll try
and fold that in.

Thanks,

M.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n139

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 04/16] KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow

2022-12-05 Thread Marc Zyngier
On Thu, 01 Dec 2022 16:51:46 +,
Ricardo Koller  wrote:
> 
> On Thu, Dec 01, 2022 at 08:47:47AM -0800, Ricardo Koller wrote:
> > On Sun, Nov 13, 2022 at 04:38:20PM +, Marc Zyngier wrote:
> > > The PMU architecture makes a subtle difference between a 64bit
> > > counter and a counter that has a 64bit overflow. This is for example
> > > the case of the cycle counter, which can generate an overflow on
> > > a 32bit boundary if PMCR_EL0.LC==0 despite the accumulation being
> > > done on 64 bits.
> > > 
> > > Use this distinction in the few cases where it matters in the code,
> > > as we will reuse this with PMUv3p5 long counters.
> > > 
> > > Signed-off-by: Marc Zyngier 
> > > ---
> > >  arch/arm64/kvm/pmu-emul.c | 43 ---
> > >  1 file changed, 31 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > > index 69b67ab3c4bf..d050143326b5 100644
> > > --- a/arch/arm64/kvm/pmu-emul.c
> > > +++ b/arch/arm64/kvm/pmu-emul.c
> > > @@ -50,6 +50,11 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
> > >   * @select_idx: The counter index
> > >   */
> > >  static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
> > > +{
> > > + return (select_idx == ARMV8_PMU_CYCLE_IDX);
> > > +}
> > > +
> > > +static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 
> > > select_idx)
> > >  {
> > >   return (select_idx == ARMV8_PMU_CYCLE_IDX &&
> > >   __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
> > > @@ -57,7 +62,8 @@ static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, 
> > > u64 select_idx)
> > >  
> > >  static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx)
> > >  {
> > > - return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX);
> > > + return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX &&
> > > + !kvm_pmu_idx_has_64bit_overflow(vcpu, idx));
> > >  }
> > >  
> > >  static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
> > > @@ -97,7 +103,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, 
> > > u64 select_idx)
> > >   counter += perf_event_read_value(pmc->perf_event, ,
> > >);
> > >  
> > > - if (select_idx != ARMV8_PMU_CYCLE_IDX)
> > > + if (!kvm_pmu_idx_is_64bit(vcpu, select_idx))
> > >   counter = lower_32_bits(counter);
> > >  
> > >   return counter;
> > > @@ -423,6 +429,23 @@ static void kvm_pmu_counter_increment(struct 
> > > kvm_vcpu *vcpu,
> > >   }
> > >  }
> > >  
> > > +/* Compute the sample period for a given counter value */
> > > +static u64 compute_period(struct kvm_vcpu *vcpu, u64 select_idx, u64 
> > > counter)
> > > +{
> > > + u64 val;
> > > +
> > > + if (kvm_pmu_idx_is_64bit(vcpu, select_idx)) {
> > > + if (!kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx))
> > > + val = -(counter & GENMASK(31, 0));
> > 
> > If I understand things correctly, this might be missing another mask:
> > 
> > +   if (!kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx)) {
> > +   val = -(counter & GENMASK(31, 0));
> > +   val &= GENMASK(31, 0);
> > +   } else {
> > 
> > For example, if the counter is 64-bits wide, it overflows at 32-bits,
> > and it is _one_ sample away from overflowing at 32-bits:
> > 
> > 0x01010101_
> > 
> > Then "val = (-counter) & GENMASK(63, 0)" would return 0x_0001.
> 
> Sorry, this should be:
> 
>   Then "val = -(counter & GENMASK(31, 0))" would return 
> 0x_0001.
> 
> > But the right period is 0x_0001 (it's one sample away from
> > overflowing).

Yup, this is a bit bogus. But this can be simplified by falling back
to the normal 32bit handling (on top of the pmu-unchained branch):

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index d8ea39943086..24908400e190 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -461,14 +461,10 @@ static u64 compute_period(struct kvm_pmc *pmc, u64 
counter)
 {
u64 val;
 
-   if (kvm_pmc_is_64bit(pmc)) {
-   if (!kvm_pmc_has_64bit_overflow(pmc))

Re: [PATCH] KVM: arm64: Always mask CCSIDR associativity bits

2022-12-04 Thread Marc Zyngier
Hey Akihiko,

Thanks for having had a look at this. A bunch of comments below.

On Fri, 02 Dec 2022 09:18:56 +,
Akihiko Odaki  wrote:
> 
> M2 MacBook Air has mismatched CCSIDR associativity bits among physical
> CPUs, which makes the bits a KVM vCPU sees inconsistent when migrating
> among them.

Your machine *does not* have any mismatched CCSIDR. By definition, any
CPU can have any cache hierarchy, and there is no architectural
requirement that they are all the same.

I'd rather you describe this in architectural terms, and simply point
out that KVM exposes the physical topology of the CPU the vcpu runs
on (including across migration, which is a problem), and that
userspace sees some arbitrary topology that has been sampled at boot
time. And both behaviours are a bit wrong in an asymmetric system.

This also break live migration for something that should never be a
concern of non-secure SW.

> 
> While it is possible to detect CCSIDR associativity bit mismatches and
> mask them with that condition, it requires mismatch detection and
> increases complexity. Instead, always mask the CCSIDR associativity bits
> to keep the code simple.

Given the above, this paragraph doesn't make much sense.

> 
> Also, allow the userspace to overwrite the bits with arbitrary values so
> that it can restore a vCPU state saved with an older kernel.
> 
> Signed-off-by: Akihiko Odaki 
> Suggested-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_arm.h |   3 +-
>  arch/arm64/include/asm/kvm_emulate.h |   4 -
>  arch/arm64/include/asm/kvm_host.h|   4 +
>  arch/arm64/include/asm/sysreg.h  |   3 +
>  arch/arm64/kvm/sys_regs.c| 146 ++-
>  5 files changed, 87 insertions(+), 73 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 8aa8492dafc0..f69cd96a65ab 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -81,11 +81,12 @@
>   * SWIO: Turn set/way invalidates into set/way clean+invalidate
>   * PTW:  Take a stage2 fault if a stage1 walk steps in device 
> memory
>   * TID3: Trap EL1 reads of group 3 ID registers
> + * TID2: Trap CCSIDR_EL1

Not only that, but also CTR_EL0, CCSIDR2_EL1, CLIDR_EL1, and
CSSELR_EL1 if the guest is using AArch64, and CCSELR if the guest is
using AArch32.

>   */
>  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
>HCR_BSU_IS | HCR_FB | HCR_TACR | \
>HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
> -  HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 )
> +  HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 | HCR_TID2)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
>  #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
>  #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 9bdba47f7e14..30c4598d643b 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -88,10 +88,6 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>   if (vcpu_el1_is_32bit(vcpu))
>   vcpu->arch.hcr_el2 &= ~HCR_RW;
>  
> - if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> - vcpu_el1_is_32bit(vcpu))
> - vcpu->arch.hcr_el2 |= HCR_TID2;
> -
>   if (kvm_has_mte(vcpu->kvm))
>   vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 45e2136322ba..cc051cd56179 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -53,6 +53,9 @@
>  
>  #define KVM_HAVE_MMU_RWLOCK
>  
> +/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
> +#define CSSELR_MAX 14
> +
>  /*
>   * Mode of operation configurable with kvm-arm.mode early param.
>   * See Documentation/admin-guide/kernel-parameters.txt for more information.
> @@ -266,6 +269,7 @@ struct kvm_cpu_context {
>   struct user_fpsimd_state fp_regs;
>  
>   u64 sys_regs[NR_SYS_REGS];
> + u32 ccsidr[CSSELR_MAX + 1];

kvm_cpu_context is the wrong location for this stuff. We use it for
things that get actively context-switched. No such thing here, as this
is RO data as far as the guest is concerned.

Also, it would probably make some sense to only allocate this memory
if the vcpu is not using the default synthesised topology, but
something that userspace has restored.

>
>   struct kvm_vcpu *__hyp_running_vcpu;
>  };
> diff --git a/arch/arm64/include/asm/sysreg.h b/ar

Re: [PATCH 0/3] KVM: arm64: Handle CCSIDR associativity mismatches

2022-12-04 Thread Marc Zyngier
On Fri, 02 Dec 2022 09:55:24 +,
Akihiko Odaki  wrote:
> 
> On 2022/12/02 18:40, Marc Zyngier wrote:
> > On Fri, 02 Dec 2022 05:17:12 +,
> > Akihiko Odaki  wrote:
> >> 
> >>>> On M2 MacBook Air, I have seen no other difference in standard ID
> >>>> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> >>>> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> >>>> that without more analysis. This is still enough to migrate vCPU
> >>>> running Linux at least.
> >>> 
> >>> I guess that MacOS hides more of the underlying HW than KVM does. And
> >>> KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
> >>> different between the two clusters.
> >> 
> >> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
> >> value for ioctls while it exposes the MIDR value each physical CPU
> >> owns to vCPU.
> > 
> > This only affects the VMM though, and not the guest which sees the
> > MIDR of the CPU it runs on. The problem is that at short of pinning
> > the vcpus, you don't know where they will run. So any value is fair
> > game.
> 
> Yes, my concern is that VMM can be confused if it sees something
> different from what the guest on the vCPU sees.

Well, this has been part of the ABI for about 10 years, since Rusty
introduced this notion of invariant, so userspace is already working
around it if that's an actual issue.

This would be easily addressed though, and shouldn't result in any
issue. The following should do the trick (only lightly tested on an
M1).

Thanks,

M.

>From f1caacb89eb8ae40dc38669160a2f081f87f4b15 Mon Sep 17 00:00:00 2001
From: Marc Zyngier 
Date: Sun, 4 Dec 2022 14:22:22 +
Subject: [PATCH] KVM: arm64: Return MIDR_EL1 to userspace as seen on the vcpu
 thread

When booting, KVM sample the MIDR of the CPU it initialises on,
and keep this as the value that will forever be exposed to userspace.

However, this has nothing to do with the value that the guest will
see. On an asymetric system, this can result in userspace observing
weird things, specially if it has pinned the vcpus on a *different*
set of CPUs.

Instead, return the MIDR value for the vpcu we're currently on and
that the vcpu will observe if it has been pinned onto that CPU.

For symmetric systems, this changes nothing. For asymmetric machines,
they will observe the correct MIDR value at the point of the call.

Reported-by: Akihiko Odaki 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f4a7c5abcbca..f6bcf8ba9b2e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1246,6 +1246,22 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *rd,
return 0;
 }
 
+static int get_midr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+   u64 *val)
+{
+   *val = read_sysreg(midr_el1);
+   return 0;
+}
+
+static int set_midr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+   u64 val)
+{
+   if (val != read_sysreg(midr_el1))
+   return -EINVAL;
+
+   return 0;
+}
+
 static int get_raz_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
   u64 *val)
 {
@@ -1432,6 +1448,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
{ SYS_DESC(SYS_DBGVCR32_EL2), NULL, reset_val, DBGVCR32_EL2, 0 },
 
+   { SYS_DESC(SYS_MIDR_EL1), .get_user = get_midr, .set_user = set_midr },
{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
 
/*
@@ -2609,7 +2626,6 @@ id_to_sys_reg_desc(struct kvm_vcpu *vcpu, u64 id,
((struct sys_reg_desc *)r)->val = read_sysreg(reg); \
}
 
-FUNCTION_INVARIANT(midr_el1)
 FUNCTION_INVARIANT(revidr_el1)
 FUNCTION_INVARIANT(clidr_el1)
 FUNCTION_INVARIANT(aidr_el1)
@@ -2621,7 +2637,6 @@ static void get_ctr_el0(struct kvm_vcpu *v, const struct 
sys_reg_desc *r)
 
 /* ->val is filled in by kvm_sys_reg_table_init() */
 static struct sys_reg_desc invariant_sys_regs[] = {
-   { SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
{ SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
{ SYS_DESC(SYS_CLIDR_EL1), NULL, get_clidr_el1 },
{ SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
-- 
2.34.1


-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Handle CCSIDR associativity mismatches

2022-12-02 Thread Marc Zyngier
On Fri, 02 Dec 2022 05:17:12 +,
Akihiko Odaki  wrote:
> 
> >> On M2 MacBook Air, I have seen no other difference in standard ID
> >> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> >> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> >> that without more analysis. This is still enough to migrate vCPU
> >> running Linux at least.
> > 
> > I guess that MacOS hides more of the underlying HW than KVM does. And
> > KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
> > different between the two clusters.
> 
> It seems KVM stores a MIDR value of a CPU and reuse it as "invariant"
> value for ioctls while it exposes the MIDR value each physical CPU
> owns to vCPU.

This only affects the VMM though, and not the guest which sees the
MIDR of the CPU it runs on. The problem is that at short of pinning
the vcpus, you don't know where they will run. So any value is fair
game.

> This may be a problem worth fixing. My understanding is that while
> there is no serious application which requires vCPU migration among
> physical clusters,

Hey, I do that all the time with kvmtool! It's just that my guest do
not care about being run on a CPU or another.

> crosvm uses KVM on big.LITTLE processors by pinning
> vCPU to physical CPU, and it is a real-world application which needs
> to be supported.
> 
> For an application like crosvm, you would expect the vCPU thread gets
> the MIDR value of the physical CPU which the thread is pinned to when
> it calls ioctl, but it can get one of another arbitrary CPU in
> reality.

No. It will get the MIDR of the CPU it runs on. Check again. What you
describing above is solely for userspace.

> 
> Fixing this problem will pose two design questions:
> 
> 1. Should it expose a value consistent among clusters?
> 
> For example, we can change the KVM initialization code so that it
> initializes VPIDR with the value stored as "invariant". This would
> help migrating vCPU among clusters, but if you pin each vCPU thread to
> a distinct phyiscal CPU, you may rather want the vCPU to see the MIDR
> value specific to each physical CPU and to apply quirks or tuning
> parameters according to the value.

Which is what happens. Not at the cluster level, but at the CPU
level. The architecture doesn't describe what a *cluster* is.

> 2. Should it be invariant or variable?
> 
> Fortunately making it variable is easy. Arm provides VPIDR_EL1
> register to specify the value exposed as MPIDR_EL0 so there is no
> trapping cost.

And if you do that you make it impossible for the guest to mitigate
errata, as most of the errata handling is based on the MIDR values.

> ...or we may just say the value of MPIDR_EL0 (and possibly other

I assume you meant MIDR_EL1 here, as MPIDR_EL1 is something else (and
it has no _EL0 equivalent).

> "invariant" registers) exposed via ioctl are useless and deprecated.

Useless? Not really. The all are meaningful to the guest, and a change
there will cause issues.

CTR_EL0 must, for example, be an invariant. Otherwise, you need to
trap all the CMOs when the {I,D}minLine values that are restored from
userspace are bigger than the ones the HW has. Even worse, when the
DIC/IDC bits are set from userspace while the HW has them cleared: you
cannot mitigate that one, and you'll end up with memory corruption.

I've been toying with the idea of exposing to guests the list of
MIDR/REVIDR the guest is allowed to run on, as a PV service. This
would allow that guest to enable all the mitigations it wants in one
go.

Not sure I have time for this at the moment, but that'd be something
to explore.

[...]

> > So let's first build on top of HCR_EL2.TID2, and only then once we
> > have an idea of the overhead add support for HCR_EL2.TID4 for the
> > systems that have FEAT_EVT.
> 
> That sounds good, I'll write a new series according to this idea.

Thanks!

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Handle CCSIDR associativity mismatches

2022-12-01 Thread Marc Zyngier
On Thu, 01 Dec 2022 18:29:51 +,
Oliver Upton  wrote:
> 
> On Thu, Dec 01, 2022 at 11:06:50AM +, Marc Zyngier wrote:
> 
> [...]
> 
> > It would be a lot better to expose a virtual topology
> > (one set, one way, one level). It would also save us from the CCSIDRX
> > silliness.
> > 
> > The only complexity would be to still accept different topologies from
> > userspace so that we can restore a VM saved before this virtual
> > topology.
> 
> I generally agree that the reported topology is meaningless to
> non-secure software.
> 
> However, with the cloud vendor hat on, I'm worried that inevitably some
> customer will inspect the cache topology of the VM we've provided them
> and complain.

That's their prerogative. It is idiotic, but I guess paying customers
get this privilege ;-).

> Could we extend your suggestion about accepting different topologies to
> effectively tolerate _any_ topology provided by userspace? KVM can
> default to the virtual topology, but a well-informed userspace could
> still provide different values to its guest. No point in trying to
> babyproofing the UAPI further, IMO.

I think this is *exactly* what I suggested. Any valid topology should
be able to be restored, as we currently present the VM with any
topology the host HW may have. This must be preserved.

Eventually, we may even have to expose CCSIDRX, but let's cross that
bridge when we get to it.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Handle CCSIDR associativity mismatches

2022-12-01 Thread Marc Zyngier
On Thu, 01 Dec 2022 17:26:08 +,
Akihiko Odaki  wrote:
> 
> On 2022/12/01 20:06, Marc Zyngier wrote:
> > On Thu, 01 Dec 2022 10:49:11 +,
> > Akihiko Odaki  wrote:
> > 
> > Thanks for looking into this.
> > 
> >> M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
> >> bits a KVM vCPU sees inconsistent when migrating.
> > 
> > Can you describe the actual discrepancy? Is that an issue between the
> > two core types? In which case, nothing says that these two cluster
> > should have the same cache topology.
> 
> Yes, the processor has big.LITTLE configuration.
> 
> On the processor, the valid CSSELR values are 0 (L1D), 1 (L1I), 3
> (L2D). For each CSSELR values, each cluster has:
> - 0x700FE03A, 0x203FE01A, 0x70FFE07B
> - 0x701FE03A, 0x203FE02A, 0x73FFE07B

This is a perfectly valid configuration. The architecture doesn't
place any limitation on how different or identical the cache
hierarchies are from the PoV of each CPU. Actually, most big-little
systems show similar differences across their clusters.

> >> It also makes QEMU fail restoring the vCPU registers because QEMU saves
> >> and restores all of the registers including CCSIDRs, and if the vCPU
> >> migrated among physical CPUs between saving and restoring, it tries to
> >> restore CCSIDR values that mismatch with the current physical CPU, which
> >> causes EFAULT.
> > 
> > Well, QEMU will have plenty of other problems, starting with MIDRs,
> > which always reflect the physical one. In general, KVM isn't well
> > geared for VMs spanning multiple CPU types. It is improving, but there
> > is a long way to go.
> 
> On M2 MacBook Air, I have seen no other difference in standard ID
> registers and CCSIDRs are exceptions. Perhaps Apple designed this way
> so that macOS's Hypervisor can freely migrate vCPU, but I can't assure
> that without more analysis. This is still enough to migrate vCPU
> running Linux at least.

I guess that MacOS hides more of the underlying HW than KVM does. And
KVM definitely doesn't hide the MIDR_EL1 registers, which *are*
different between the two clusters.

> >> Trap CCSIDRs if there are CCSIDR value msimatches, and override the
> >> associativity bits when handling the trap.
> > 
> > TBH, I'd rather we stop reporting this stuff altogether.
> > 
> > There is nothing a correctly written arm64 guest should do with any of
> > this (this is only useful for set/way CMOs, which non-secure SW should
> > never issue). It would be a lot better to expose a virtual topology
> > (one set, one way, one level). It would also save us from the CCSIDRX
> > silliness.
> > 
> > The only complexity would be to still accept different topologies from
> > userspace so that we can restore a VM saved before this virtual
> > topology.
> 
> Another (minor) concern is that trapping relevant registers may cost
> too much. Currently KVM traps CSSELR and CCSIDR accesses with
> HCR_TID2, but HCR_TID2 also affects CTR_EL0.

It will have an additional impact (JITs, for example, will take a hit
if they don't cache that value), but this is pretty easy to mitigate
if it proves to have too much of an impact. We already have a bunch of
fast-paths for things that we want to emulate more efficiently, and
CTR_EL0 could be one of them,

> Although I'm not sure if the register is referred frequently, Arm
> introduced FEAT_EVT to trap CSSELR and CSSIDR but not CTR_EL0 so
> there may be some case where trapping CTR_EL0 is not
> tolerated. Perhaps Arm worried that a userspace application may read
> CTR_EL0 frequently.

FEAT_EVT is one of these "let's add random traps" extensions,
culminating in FEAT_FGT. Having FEAT_EVT would make it more efficient,
but we need to support this for all revisions of the architecture.

So let's first build on top of HCR_EL2.TID2, and only then once we
have an idea of the overhead add support for HCR_EL2.TID4 for the
systems that have FEAT_EVT.

> If you think the concern on VM restoration you mentioned and the
> trapping overhead is tolerable, I'll write a new, much smaller patch
> accordingly.

That would great, thanks. There are a number of gotchas around that
(like the 32bit stuff that already plays the emulation game), but this
is the right time to start and have something in 6.3 if you keep to it!

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Handle CCSIDR associativity mismatches

2022-12-01 Thread Marc Zyngier
On Thu, 01 Dec 2022 10:49:11 +,
Akihiko Odaki  wrote:

Thanks for looking into this.

> M2 MacBook Air has mismatched CCSIDR associativity bits, which makes the
> bits a KVM vCPU sees inconsistent when migrating.

Can you describe the actual discrepancy? Is that an issue between the
two core types? In which case, nothing says that these two cluster
should have the same cache topology.

> It also makes QEMU fail restoring the vCPU registers because QEMU saves
> and restores all of the registers including CCSIDRs, and if the vCPU
> migrated among physical CPUs between saving and restoring, it tries to
> restore CCSIDR values that mismatch with the current physical CPU, which
> causes EFAULT.

Well, QEMU will have plenty of other problems, starting with MIDRs,
which always reflect the physical one. In general, KVM isn't well
geared for VMs spanning multiple CPU types. It is improving, but there
is a long way to go.

> Trap CCSIDRs if there are CCSIDR value msimatches, and override the
> associativity bits when handling the trap.

TBH, I'd rather we stop reporting this stuff altogether.

There is nothing a correctly written arm64 guest should do with any of
this (this is only useful for set/way CMOs, which non-secure SW should
never issue). It would be a lot better to expose a virtual topology
(one set, one way, one level). It would also save us from the CCSIDRX
silliness.

The only complexity would be to still accept different topologies from
userspace so that we can restore a VM saved before this virtual
topology.

Do you mind having a look at this?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/4] KVM: arm64: Parallel access faults

2022-11-30 Thread Marc Zyngier
On Tue, 29 Nov 2022 19:19:42 +,
Oliver Upton  wrote:
> 
> When I implemented the parallel faults series I was mostly focused on
> improving the performance of 8.1+ implementations which bring us
> FEAT_HAFDBS. In so doing, I failed to put access faults on the read side
> of the MMU lock.
> 
> Anyhow, this small series adds support for handling access faults in
> parallel, piling on top of the infrastructure from the first parallel
> faults series. As most large systems I'm aware of are 8.1+ anyway, I
> don't expect this series to provide significant uplift beyond some
> oddball machines Marc has lying around. Don't get me wrong, I'd love to
> have a D05 to play with too...

Hey, that puts the whole fruity range of machines in the oddball
department too, as they don't implement any of HAFDBS!

The feature being optional, I wouldn't be surprised if others would
either not implement it (or disable it to hide that it is b0rk3n...).

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/4] KVM: arm64: Don't serialize if the access flag isn't set

2022-11-30 Thread Marc Zyngier
On Wed, 30 Nov 2022 01:23:20 +,
Ricardo Koller  wrote:
> 
> On Tue, Nov 29, 2022 at 09:15:21PM +, Oliver Upton wrote:
> > Hi Ricardo,
> > 
> > Thanks for having a look.
> > 
> > On Tue, Nov 29, 2022 at 12:52:12PM -0800, Ricardo Koller wrote:
> > > On Tue, Nov 29, 2022 at 07:19:44PM +, Oliver Upton wrote:
> > 
> > [...]
> > 
> > > > +   ret = stage2_update_leaf_attrs(pgt, addr, 1, 
> > > > KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
> > > > +  , NULL, 0);
> > > > +   if (!ret)
> > > > +   dsb(ishst);
> > > 
> > > At the moment, the only reason for stage2_update_leaf_attrs() to not
> > > update the PTE is if it's not valid:
> > > 
> > >   if (!kvm_pte_valid(pte))
> > >   return 0;
> > > 
> > > I guess you could check that as well:
> > > 
> > > + if (!ret || kvm_pte_valid(pte))
> > > + dsb(ishst);
> > 
> > Thanks for catching this.
> > 
> > Instead of pivoting on the returned PTE value, how about we return
> > -EAGAIN from the early return in stage2_attr_walker()? It would better
> > match the pattern used elsewhere in the pgtable code.
> 
> That works, although I would use another return code (e.g., EINVAL)? as
> that's not exactly a "try again" type of error.

EINVAL usually is an indication of something that went horribly wrong.

But is that really a failure mode? Here, failing to update the PTE
should not be considered a failure, but just a benign race: access
fault being taken on a CPU and the page being evicted on another (not
unlikely, as the page was marked old before).

And if I'm correct above, this is definitely a "try again" situation:
you probably won't take the same type of fault the second time though.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 0/2] KVM: selftests: Enable access_tracking_perf_test for arm64

2022-11-29 Thread Marc Zyngier
On Fri, 18 Nov 2022 21:15:01 +, Oliver Upton wrote:
> Small series to add support for arm64 to access_tracking_perf_test and
> correct a couple bugs along the way.
> 
> Tested on Ampere Altra w/ all supported guest modes.
> 
> v1 -> v2:
>  - Have perf_test_util indicate when to stop vCPU threads (Sean)
>  - Collect Gavin's R-b on the second patch. I left off Gavin's R-b on
>the first patch as it was retooled.
> 
> [...]

Applied to next, thanks!

[1/2] KVM: selftests: Have perf_test_util signal when to stop vCPUs
  commit: 9ec1eb1bcceec735fb3c9255cdcdbcc2acf860a0
[2/2] KVM: selftests: Build access_tracking_perf_test for arm64
  commit: 4568180411e0fb5613e217da1c693466e39b9c27

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/8] KVM: arm64: permit MAP_SHARED mappings with MTE enabled

2022-11-29 Thread Marc Zyngier
On Thu, 3 Nov 2022 18:10:33 -0700, Peter Collingbourne wrote:
> This patch series allows VMMs to use shared mappings in MTE enabled
> guests. The first five patches were taken from Catalin's tree [1] which
> addressed some review feedback from when they were previously sent out
> as v3 of this series. The first patch from Catalin's tree makes room
> for an additional PG_arch_3 flag by making the newer PG_arch_* flags
> arch-dependent. The next four patches are based on a series that
> Catalin sent out prior to v3, whose cover letter [2] I quote from below:
> 
> [...]

No feedback has been received, so this code is obviously perfect.

Applied to next, thanks!

[1/8] mm: Do not enable PG_arch_2 for all 64-bit architectures
  commit: b0284cd29a957e62d60c2886fd663be93c56f9c0
[2/8] arm64: mte: Fix/clarify the PG_mte_tagged semantics
  commit: e059853d14ca4ed0f6a190d7109487918a22a976
[3/8] KVM: arm64: Simplify the sanitise_mte_tags() logic
  commit: 2dbf12ae132cc78048615cfa19c9be64baaf0ced
[4/8] mm: Add PG_arch_3 page flag
  commit: ef6458b1b6ca3fdb991ce4182e981a88d4c58c0f
[5/8] arm64: mte: Lock a page for MTE tag initialisation
  commit: d77e59a8fccde7fb5dd8c57594ed147b4291c970
[6/8] KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled
  commit: d89585fbb30869011b326ef26c94c3137d228df9
[7/8] KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled
  commit: c911f0d4687947915f04024aa01803247fcf7f1a
[8/8] Documentation: document the ABI changes for KVM_CAP_ARM_MTE
  commit: a4baf8d2639f24d4d31983ff67c01878e7a5393f

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 0/8] KVM: arm64: permit MAP_SHARED mappings with MTE enabled

2022-11-24 Thread Marc Zyngier
On Fri, 04 Nov 2022 17:42:27 +,
Peter Collingbourne  wrote:
> 
> On Fri, Nov 4, 2022 at 9:23 AM Marc Zyngier  wrote:
> >
> > On Fri, 04 Nov 2022 01:10:33 +,
> > Peter Collingbourne  wrote:
> > >
> > > Hi,
> > >
> > > This patch series allows VMMs to use shared mappings in MTE enabled
> > > guests. The first five patches were taken from Catalin's tree [1] which
> > > addressed some review feedback from when they were previously sent out
> > > as v3 of this series. The first patch from Catalin's tree makes room
> > > for an additional PG_arch_3 flag by making the newer PG_arch_* flags
> > > arch-dependent. The next four patches are based on a series that
> > > Catalin sent out prior to v3, whose cover letter [2] I quote from below:
> > >
> > > > This series aims to fix the races between initialising the tags on a
> > > > page and setting the PG_mte_tagged flag. Currently the flag is set
> > > > either before or after that tag initialisation and this can lead to CoW
> > > > copying stale tags. The first patch moves the flag setting after the
> > > > tags have been initialised, solving the CoW issue. However, concurrent
> > > > mprotect() on a shared mapping may (very rarely) lead to valid tags
> > > > being zeroed.
> > > >
> > > > The second skips the sanitise_mte_tags() call in kvm_set_spte_gfn(),
> > > > deferring it to user_mem_abort(). The outcome is that no
> > > > sanitise_mte_tags() can be simplified to skip the pfn_to_online_page()
> > > > check and only rely on VM_MTE_ALLOWED vma flag that can be checked in
> > > > user_mem_abort().
> > > >
> > > > The third and fourth patches use PG_arch_3 as a lock for page tagging,
> > > > based on Peter Collingbourne's idea of a two-bit lock.
> > > >
> > > > I think the first patch can be queued but the rest needs some in depth
> > > > review and test. With this series (if correct) we could allos MAP_SHARED
> > > > on KVM guest memory but this is to be discussed separately as there are
> > > > some KVM ABI implications.
> > >
> > > In this v5 I rebased Catalin's tree onto -next again. Please double check
> >
> > Please don't do use -next as a base. In-flight series should be based
> > on a *stable* tag, either 6.0 or one of the early -RCs. If there is a
> > known conflict with -next, do mention it in the cover letter and
> > provide a resolution.
> 
> Okay, I will keep that in mind.
> 
> > > my rebase, which resolved the conflict with commit a8e5e5146ad0 ("arm64:
> > > mte: Avoid setting PG_mte_tagged if no tags cleared or restored").
> >
> > This commit seems part of -rc1, so I guess the patches directly apply
> > on top of that tag?
> 
> Yes, sorry, this also applies cleanly to -rc1.
> 
> > > I now have Reviewed-by for all patches except for the last one, which adds
> > > the documentation. Thanks for the reviews so far, and please take a look!
> >
> > I'd really like the MM folks (list now cc'd) to look at the relevant
> > patches (1 and 5) and ack them before I take this.
> 
> Okay, here are the lore links for the convenience of the MM folks:
> https://lore.kernel.org/all/20221104011041.290951-2-...@google.com/
> https://lore.kernel.org/all/20221104011041.290951-6-...@google.com/

I have not seen any Ack from the MM folks so far, and we're really
running out of runway for this merge window.

Short of someone shouting now, I'll take the series into the kvmarm
tree early next week.

Thanks,


-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 13/16] KVM: arm64: PMU: Implement PMUv3p5 long counter support

2022-11-24 Thread Marc Zyngier
On Wed, 23 Nov 2022 17:11:41 +,
Reiji Watanabe  wrote:
> 
> Hi Marc,
> 
> On Wed, Nov 23, 2022 at 3:11 AM Marc Zyngier  wrote:
> >
> > On Wed, 23 Nov 2022 05:58:17 +,
> > Reiji Watanabe  wrote:
> > >
> > > Hi Marc,
> > >
> > > On Sun, Nov 13, 2022 at 8:46 AM Marc Zyngier  wrote:
> > > >
> > > > PMUv3p5 (which is mandatory with ARMv8.5) comes with some extra
> > > > features:
> > > >
> > > > - All counters are 64bit
> > > >
> > > > - The overflow point is controlled by the PMCR_EL0.LP bit
> > > >
> > > > Add the required checks in the helpers that control counter
> > > > width and overflow, as well as the sysreg handling for the LP
> > > > bit. A new kvm_pmu_is_3p5() helper makes it easy to spot the
> > > > PMUv3p5 specific handling.
> > > >
> > > > Signed-off-by: Marc Zyngier 
> > > > ---
> > > >  arch/arm64/kvm/pmu-emul.c | 8 +---
> > > >  arch/arm64/kvm/sys_regs.c | 4 
> > > >  include/kvm/arm_pmu.h | 7 +++
> > > >  3 files changed, 16 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > > > index 4320c389fa7f..c37cc67ff1d7 100644
> > > > --- a/arch/arm64/kvm/pmu-emul.c
> > > > +++ b/arch/arm64/kvm/pmu-emul.c
> > > > @@ -52,13 +52,15 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
> > > >   */
> > > >  static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
> > > >  {
> > > > -   return (select_idx == ARMV8_PMU_CYCLE_IDX);
> > > > +   return (select_idx == ARMV8_PMU_CYCLE_IDX || 
> > > > kvm_pmu_is_3p5(vcpu));
> > > >  }
> > > >
> > > >  static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 
> > > > select_idx)
> > > >  {
> > > > -   return (select_idx == ARMV8_PMU_CYCLE_IDX &&
> > > > -   __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
> > > > +   u64 val = __vcpu_sys_reg(vcpu, PMCR_EL0);
> > > > +
> > > > +   return (select_idx < ARMV8_PMU_CYCLE_IDX && (val & 
> > > > ARMV8_PMU_PMCR_LP)) ||
> > > > +  (select_idx == ARMV8_PMU_CYCLE_IDX && (val & 
> > > > ARMV8_PMU_PMCR_LC));
> > >
> > > Since the vCPU's PMCR_EL0 value is not always in sync with
> > > kvm->arch.dfr0_pmuver.imp, shouldn't kvm_pmu_idx_has_64bit_overflow()
> > > check kvm_pmu_is_3p5() ?
> > > (e.g. when the host supports PMUv3p5, PMCR.LP will be set by reset_pmcr()
> > > initially. Then, even if userspace sets ID_AA64DFR0_EL1.PMUVER to
> > > PMUVer_V3P1, PMCR.LP will stay the same (still set) unless PMCR is
> > > written.  So, kvm_pmu_idx_has_64bit_overflow() might return true
> > > even though the guest's PMU version is lower than PMUVer_V3P5.)

I realised that reset_pmcr() cannot result in LP being set early, as
the default PMU version isn't PMUv3p5. But I'm starting to think that
we should stop playing random tricks with PMCR reset value, and make
the whole thing as straightforward as possible. TBH, the only
information we actually need from the host is PMCR_EL0.N, so we should
limit ourselves to that.

> >
> > I can see two ways to address this: either we spray PMUv3p5 checks
> > every time we evaluate PMCR, or we sanitise PMCR after each userspace
> > write to ID_AA64DFR0_EL1.
> >
> > I'd like to be able to take what is stored in the register file at
> > face value, so I'm angling towards the second possibility. It also
> 
> I thought about that too.  What makes it a bit tricky is that
> given that kvm->arch.dfr0_pmuver.imp is shared among all vCPUs
> for the guest, updating the PMCR should be done for all the vCPUs.

Yeah, good point. This really is a mess.

> > +static void update_dfr0_pmuver(struct kvm_vcpu *vcpu, u8 pmuver)
> > +{
> > +   if (vcpu->kvm->arch.dfr0_pmuver.imp != pmuver) {
> > +   vcpu->kvm->arch.dfr0_pmuver.imp = pmuver;
> > +   __reset_pmcr(vcpu, __vcpu_sys_reg(vcpu, PMCR_EL0));
> > +   }
> > +}
> 
> Or if userspace is expected to set ID_AA64DFR0_EL1 (PMUVER) for
> each vCPU, update_dfr0_pmuver() should update PMCR even when
> 'kvm->arch.dfr0_pmuver.imp' is the same as the given 'pmuver'.
> (as PMCR for the vCPU might have not been updated yet)
> 
> > makes some sense from 

Re: [PATCH v4 13/16] KVM: arm64: PMU: Implement PMUv3p5 long counter support

2022-11-23 Thread Marc Zyngier
On Wed, 23 Nov 2022 05:58:17 +,
Reiji Watanabe  wrote:
> 
> Hi Marc,
> 
> On Sun, Nov 13, 2022 at 8:46 AM Marc Zyngier  wrote:
> >
> > PMUv3p5 (which is mandatory with ARMv8.5) comes with some extra
> > features:
> >
> > - All counters are 64bit
> >
> > - The overflow point is controlled by the PMCR_EL0.LP bit
> >
> > Add the required checks in the helpers that control counter
> > width and overflow, as well as the sysreg handling for the LP
> > bit. A new kvm_pmu_is_3p5() helper makes it easy to spot the
> > PMUv3p5 specific handling.
> >
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/pmu-emul.c | 8 +---
> >  arch/arm64/kvm/sys_regs.c | 4 
> >  include/kvm/arm_pmu.h | 7 +++
> >  3 files changed, 16 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > index 4320c389fa7f..c37cc67ff1d7 100644
> > --- a/arch/arm64/kvm/pmu-emul.c
> > +++ b/arch/arm64/kvm/pmu-emul.c
> > @@ -52,13 +52,15 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
> >   */
> >  static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
> >  {
> > -   return (select_idx == ARMV8_PMU_CYCLE_IDX);
> > +   return (select_idx == ARMV8_PMU_CYCLE_IDX || kvm_pmu_is_3p5(vcpu));
> >  }
> >
> >  static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 
> > select_idx)
> >  {
> > -   return (select_idx == ARMV8_PMU_CYCLE_IDX &&
> > -   __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
> > +   u64 val = __vcpu_sys_reg(vcpu, PMCR_EL0);
> > +
> > +   return (select_idx < ARMV8_PMU_CYCLE_IDX && (val & 
> > ARMV8_PMU_PMCR_LP)) ||
> > +  (select_idx == ARMV8_PMU_CYCLE_IDX && (val & 
> > ARMV8_PMU_PMCR_LC));
> 
> Since the vCPU's PMCR_EL0 value is not always in sync with
> kvm->arch.dfr0_pmuver.imp, shouldn't kvm_pmu_idx_has_64bit_overflow()
> check kvm_pmu_is_3p5() ?
> (e.g. when the host supports PMUv3p5, PMCR.LP will be set by reset_pmcr()
> initially. Then, even if userspace sets ID_AA64DFR0_EL1.PMUVER to
> PMUVer_V3P1, PMCR.LP will stay the same (still set) unless PMCR is
> written.  So, kvm_pmu_idx_has_64bit_overflow() might return true
> even though the guest's PMU version is lower than PMUVer_V3P5.)

I can see two ways to address this: either we spray PMUv3p5 checks
every time we evaluate PMCR, or we sanitise PMCR after each userspace
write to ID_AA64DFR0_EL1.

I'd like to be able to take what is stored in the register file at
face value, so I'm angling towards the second possibility. It also
makes some sense from a 'HW' perspective: you change the HW
dynamically by selecting a new version, the HW comes up with its reset
configuration (i.e don't expect PMCR to stick after you write to
DFR0 with a different PMUVer).

>
> 
> >  }
> >
> >  static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx)
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index dc201a0557c0..615cb148e22a 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -654,6 +654,8 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const 
> > struct sys_reg_desc *r)
> >| (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
> > if (!kvm_supports_32bit_el0())
> > val |= ARMV8_PMU_PMCR_LC;
> > +   if (!kvm_pmu_is_3p5(vcpu))
> > +   val &= ~ARMV8_PMU_PMCR_LP;
> > __vcpu_sys_reg(vcpu, r->reg) = val;
> >  }
> >
> > @@ -703,6 +705,8 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct 
> > sys_reg_params *p,
> > val |= p->regval & ARMV8_PMU_PMCR_MASK;
> > if (!kvm_supports_32bit_el0())
> > val |= ARMV8_PMU_PMCR_LC;
> > +   if (!kvm_pmu_is_3p5(vcpu))
> > +   val &= ~ARMV8_PMU_PMCR_LP;
> > __vcpu_sys_reg(vcpu, PMCR_EL0) = val;
> > kvm_pmu_handle_pmcr(vcpu, val);
> > kvm_vcpu_pmu_restore_guest(vcpu);
> 
> For the read case of access_pmcr() (the code below),
> since PMCR.LP is RES0 when FEAT_PMUv3p5 is not implemented,
> shouldn't it clear PMCR.LP if kvm_pmu_is_3p5(vcpu) is false ?
> (Similar issue to kvm_pmu_idx_has_64bit_overflow())
> 
> } else {
> /* PMCR.P & PMCR.C are RAZ */
> val = __vcpu_sys_reg(vcpu, PMCR_EL0)
>   & ~(ARMV8_PMU_PMCR_P

Re: [PATCH v4 12/16] KVM: arm64: PMU: Allow ID_DFR0_EL1.PerfMon to be set from userspace

2022-11-19 Thread Marc Zyngier

On 2022-11-19 05:52, Reiji Watanabe wrote:

Hi Marc,

On Sun, Nov 13, 2022 at 8:46 AM Marc Zyngier  wrote:


Allow userspace to write ID_DFR0_EL1, on the condition that only
the PerfMon field can be altered and be something that is compatible
with what was computed for the AArch64 view of the guest.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 57 
++-

 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3cbcda665d23..dc201a0557c0 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1070,6 +1070,19 @@ static u8 vcpu_pmuver(const struct kvm_vcpu 
*vcpu)

return vcpu->kvm->arch.dfr0_pmuver.unimp;
 }

+static u8 perfmon_to_pmuver(u8 perfmon)
+{
+   switch (perfmon) {
+   case ID_DFR0_PERFMON_8_0:
+   return ID_AA64DFR0_EL1_PMUVer_IMP;
+   case ID_DFR0_PERFMON_IMP_DEF:
+   return ID_AA64DFR0_EL1_PMUVer_IMP_DEF;


Nit: Since IMP_DEF is 0xf for both PMUVER and PERFMON,
I think the 'default' can handle IMP_DEF (I have the same
comment for pmuver_to_perfmon in the patch-10).


It sure can, but IMP_DEF is special enough in its treatment
(we explicitly check this value in set_id_dfr0_el1()) that
it actually helps the reader to keep the explicit conversion
here.




+   default:
+   /* Anything ARMv8.1+ has the same value. For now. */


Nit: Shouldn't the comment also mention NI (and IMP_DEF) ?
(I have the same comment for pmuver_to_perfmon in the patch-10)


I can expand the comment to include NI.


Otherwise:
Reviewed-by: Reiji Watanabe 


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 09/16] KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits

2022-11-19 Thread Marc Zyngier

On 2022-11-18 07:45, Reiji Watanabe wrote:

Hi Marc,

On Sun, Nov 13, 2022 at 8:38 AM Marc Zyngier  wrote:


Even when using PMUv3p5 (which implies 64bit counters), there is
no way for AArch32 to write to the top 32 bits of the counters.
The only way to influence these bits (other than by counting
events) is by writing PMCR.P==1.

Make sure we obey the architecture and preserve the top 32 bits
on a counter update.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 35 +++
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index ea0c8411641f..419e5e0a13d0 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -119,13 +119,8 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu 
*vcpu, u64 select_idx)

return counter;
 }

-/**
- * kvm_pmu_set_counter_value - set PMU counter value
- * @vcpu: The vcpu pointer
- * @select_idx: The counter index
- * @val: The counter value
- */
-void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, 
u64 val)
+static void kvm_pmu_set_counter(struct kvm_vcpu *vcpu, u64 
select_idx, u64 val,

+   bool force)
 {
u64 reg;

@@ -135,12 +130,36 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu 
*vcpu, u64 select_idx, u64 val)

kvm_pmu_release_perf_event(>arch.pmu.pmc[select_idx]);

reg = counter_index_to_reg(select_idx);
+
+   if (vcpu_mode_is_32bit(vcpu) && select_idx != 
ARMV8_PMU_CYCLE_IDX &&

+   !force) {
+   /*
+* Even with PMUv3p5, AArch32 cannot write to the top
+* 32bit of the counters. The only possible course of
+* action is to use PMCR.P, which will reset them to
+* 0 (the only use of the 'force' parameter).
+*/
+   val  = lower_32_bits(val);
+   val |= upper_32_bits(__vcpu_sys_reg(vcpu, reg));


Shouldn't the result of upper_32_bits() be shifted 32bits left
before ORing (to maintain the upper 32bits of the current value) ?


Indeed, and it only shows that AArch32 has had no testing
whatsoever :-(.

I'll fix it up locally.

Thanks again,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [External] Re: [v2 0/6] KVM: arm64: implement vcpu_is_preempted check

2022-11-17 Thread Marc Zyngier
On Mon, 07 Nov 2022 12:00:44 +,
Usama Arif  wrote:
> 
> 
> 
> On 06/11/2022 16:35, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 06:20:59 +,
> > Usama Arif  wrote:
> >> 
> >> This patchset adds support for vcpu_is_preempted in arm64, which
> >> allows the guest to check if a vcpu was scheduled out, which is
> >> useful to know incase it was holding a lock. vcpu_is_preempted can
> >> be used to improve performance in locking (see owner_on_cpu usage in
> >> mutex_spin_on_owner, mutex_can_spin_on_owner, rtmutex_spin_on_owner
> >> and osq_lock) and scheduling (see available_idle_cpu which is used
> >> in several places in kernel/sched/fair.c for e.g. in wake_affine to
> >> determine which CPU can run soonest):
> > 
> > [...]
> > 
> >> pvcy shows a smaller overall improvement (50%) compared to
> >> vcpu_is_preempted (277%).  Host side flamegraph analysis shows that
> >> ~60% of the host time when using pvcy is spent in kvm_handle_wfx,
> >> compared with ~1.5% when using vcpu_is_preempted, hence
> >> vcpu_is_preempted shows a larger improvement.
> > 
> > And have you worked out *why* we spend so much time handling WFE?
> > 
> > M.
> 
> Its from the following change in pvcy patchset:
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index e778eefcf214..915644816a85 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -118,7 +118,12 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> }
> 
> if (esr & ESR_ELx_WFx_ISS_WFE) {
> -   kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> +   int state;
> +   while ((state = kvm_pvcy_check_state(vcpu)) == 0)
> +   schedule();
> +
> +   if (state == -1)
> +   kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> } else {
> if (esr & ESR_ELx_WFx_ISS_WFxT)
> vcpu_set_flag(vcpu, IN_WFIT);
> 
> 
> If my understanding is correct of the pvcy changes, whenever pvcy
> returns an unchanged vcpu state, we would schedule to another
> vcpu. And its the constant scheduling where the time is spent. I guess
> the affects are much higher when the lock contention is very
> high. This can be seem from the pvcy host side flamegraph as well with
> (~67% of the time spent in the schedule() call in kvm_handle_wfx), For
> reference, I have put the graph at:
> https://uarif1.github.io/pvlock/perf_host_pvcy_nmi.svg

The real issue here is that we don't try to pick the right vcpu to
run, and strictly rely on schedule() to eventually pick something that
can run.

An interesting to do would be to try and fit the directed yield
mechanism there. It would be a lot more interesting than the one-off
vcpu_is_preempted hack, as it gives us a low-level primitive on which
to construct things (pvcy is effectively a mwait-like primitive).

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] KVM: arm64: selftests: Fixes for single-step test

2022-11-17 Thread Marc Zyngier
On Thu, 17 Nov 2022 00:23:48 +,
Sean Christopherson  wrote:
> 
> Marc,
> 
> I would like to route this through Paolo's tree/queue for 6.2 along with
> a big pile of other selftests updates.  I am hoping to get the selftests
> pile queued sooner than later as there is a lot of active development in
> that area, and don't want to have the selftests be in a broken state.
> I'm going to send Paolo a pull request shortly, I'll Cc you (and others)
> to keep everyone in the loop and give a chance for objections.
> 
> 
> 
> Fix a typo and an imminenent not-technically-a-bug bug in the single-step
> test where executing an atomic sequence in the guest with single-step
> enable will hang the guest due to eret clearing the local exclusive
> monitor.
> 
> 
> Sean Christopherson (2):
>   KVM: arm64: selftests: Disable single-step with correct KVM define
>   KVM: arm64: selftests: Disable single-step without relying on ucall()

I'm obviously late to the party, but hey... For the record:

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] KVM: arm64: Don't acquire RCU read lock for exclusive table walks

2022-11-15 Thread Marc Zyngier
On Tue, 15 Nov 2022 22:55:02 +,
Oliver Upton  wrote:
> 
> Marek reported a BUG resulting from the recent parallel faults changes,
> as the hyp stage-1 map walker attempted to allocate table memory while
> holding the RCU read lock:
> 
>   BUG: sleeping function called from invalid context at
>   include/linux/sched/mm.h:274
>   in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>   preempt_count: 0, expected: 0
>   RCU nest depth: 1, expected: 0
>   2 locks held by swapper/0/1:
> #0: 8a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at:
>   __create_hyp_mappings+0x80/0xc4
> #1: 8a927720 (rcu_read_lock){}-{1:2}, at:
>   kvm_pgtable_walk+0x0/0x1f4
>   CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
>   Hardware name: Raspberry Pi 3 Model B (DT)
>   Call trace:
> dump_backtrace.part.0+0xe4/0xf0
> show_stack+0x18/0x40
> dump_stack_lvl+0x8c/0xb8
> dump_stack+0x18/0x34
> __might_resched+0x178/0x220
> __might_sleep+0x48/0xa0
> prepare_alloc_pages+0x178/0x1a0
> __alloc_pages+0x9c/0x109c
> alloc_page_interleave+0x1c/0xc4
> alloc_pages+0xec/0x160
> get_zeroed_page+0x1c/0x44
> kvm_hyp_zalloc_page+0x14/0x20
> hyp_map_walker+0xd4/0x134
> kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
> __kvm_pgtable_walk+0x1a4/0x220
> kvm_pgtable_walk+0x104/0x1f4
> kvm_pgtable_hyp_map+0x80/0xc4
> __create_hyp_mappings+0x9c/0xc4
> kvm_mmu_init+0x144/0x1cc
> kvm_arch_init+0xe4/0xef4
> kvm_init+0x3c/0x3d0
> arm_init+0x20/0x30
> do_one_initcall+0x74/0x400
> kernel_init_freeable+0x2e0/0x350
> kernel_init+0x24/0x130
> ret_from_fork+0x10/0x20
> 
> Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex,
> RCU protection really doesn't add anything. Don't acquire the RCU read
> lock for an exclusive walk. While at it, add a warning which codifies
> the lack of support for shared walks in the hypervisor code.
> 
> Reported-by: Marek Szyprowski 
> Signed-off-by: Oliver Upton 
> ---
> 
> Applies on top of the parallel faults series that was picked up last
> week. Tested with kvm-arm.mode={nvhe,protected} on an Ampere Altra
> system.
> 
> v1 -> v2:
>  - Took Will's suggestion of conditioning RCU on a flag, small tweak to
>use existing bit instead (Thanks!)
> 
>  arch/arm64/include/asm/kvm_pgtable.h | 22 --
>  arch/arm64/kvm/hyp/pgtable.c |  5 +++--
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> b/arch/arm64/include/asm/kvm_pgtable.h
> index a874ce0ce7b5..d4c7321fa652 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -51,8 +51,16 @@ static inline kvm_pte_t 
> *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>   return pteref;
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void) {}
> -static inline void kvm_pgtable_walk_end(void) {}
> +static inline void kvm_pgtable_walk_begin(bool shared)
> +{
> + /*
> +  * Due to the lack of RCU (or a similar protection scheme), only
> +  * non-shared table walkers are allowed in the hypervisor.
> +  */
> + WARN_ON(shared);
> +}
> +
> +static inline void kvm_pgtable_walk_end(bool shared) {}
>  
>  static inline bool kvm_pgtable_walk_lock_held(void)
>  {
> @@ -68,14 +76,16 @@ static inline kvm_pte_t 
> *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>   return rcu_dereference_check(pteref, !shared);
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void)
> +static inline void kvm_pgtable_walk_begin(bool shared)

I'm not crazy about this sort of parameters. I think it would make a
lot more sense to pass a pointer to the walker structure and do the
flag check inside the helper.

That way, we avoid extra churn if/when we need extra state or
bookkeeping around the walk.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 2/3] KVM: arm64: Allow userspace to trap SMCCC sub-ranges

2022-11-14 Thread Marc Zyngier
On Fri, 11 Nov 2022 23:39:09 +,
Oliver Upton  wrote:
> 
> On Fri, Nov 11, 2022 at 08:26:02AM +, Marc Zyngier wrote:
> > On Thu, 10 Nov 2022 21:13:54 +, Oliver Upton  
> > wrote:
> > > The goal of what I was trying to get at is that either the kernel or
> > > userspace takes ownership of a range that has an ABI, but not both. i.e.
> > > you really wouldn't want some VMM or cloud provider trapping portions of
> > > KVM's vendor-specific range while still reporting a 'vanilla' ABI at the
> > > time of discovery. Same goes for PSCI, TRNG, etc.
> > 
> > But I definitely think this is one of the major use cases. For
> > example, there is value in taking PSCI to userspace in order to
> > implement a newer version of the spec, or to support sub-features that
> > KVM doesn't (want to) implement. I don't think this changes the ABI from
> > the guest perspective.
> 
> I disagree for the implications of partially trapping the 'Vendor
> Specific Hypervisor Service'. If the UID for the range still reports KVM
> but userspace decided to add some new widget, then from the guest
> perspective that widget is now part of KVM's own ABI with the guest.

But that's what I mean by "I don't think this changes the ABI from the
guest perspective". The guest cannot know who is doing the emulation
anyway, so it is userspace's duty to preserve the illusion. At the
end of the day, this is only a configuration mechanism, and it is no
different from all other configuration bits (i.e. they need to be
identical on both side for migration).

> Trapping the whole range is a bit of a hack to workaround the need for
> more complicated verification of a hypercall filter.

We already need these things for architected hypercalls. Once we have
the infrastructure, it doesn't matter anymore which range this is for.

> 
> But for everything else, I'm fine with arbitrary function filtering.
> Userspace is always welcome to shoot itself in the foot.
> 
> > pKVM also has a use case for this where userspace gets a notification
> > of the hypercall that a guest has performed to share memory.
> 
> Is that hypercall in the 'Vendor Specific Hypervisor Service' range?

Yes. It is get another KVM hypercall.

> 
> > Communication with a TEE also is on the cards, as would be a FFA
> > implementation. All of this could be implemented in KVM, or in
> > userspace, depending what users of these misfeatures want to do.
> 
> I'm very hopeful that by forwarding all of this to userspace we can get
> out of the business of implementing every darn spec that comes along.

Good luck. All the TEEs have private, home grown APIs, and every
vendor will want to implement their own crap (i.e. there is no spec).

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 16/16] KVM: arm64: PMU: Make kvm_pmc the main data structure

2022-11-13 Thread Marc Zyngier
The PMU code has historically been torn between referencing a counter
as a pair vcpu+index or as the PMC pointer.

Given that it is pretty easy to go from one representation to
the other, standardise on the latter which, IMHO, makes the
code slightly more readable. YMMV.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 174 +++---
 1 file changed, 87 insertions(+), 87 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index e3d5fe260dcc..cf929c626b79 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -22,9 +22,19 @@ DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
 static LIST_HEAD(arm_pmus);
 static DEFINE_MUTEX(arm_pmus_lock);
 
-static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
+static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc);
 static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc);
 
+static struct kvm_vcpu *kvm_pmc_to_vcpu(const struct kvm_pmc *pmc)
+{
+   return container_of(pmc, struct kvm_vcpu, arch.pmu.pmc[pmc->idx]);
+}
+
+static struct kvm_pmc *kvm_vcpu_idx_to_pmc(struct kvm_vcpu *vcpu, int cnt_idx)
+{
+   return >arch.pmu.pmc[cnt_idx];
+}
+
 static u32 kvm_pmu_event_mask(struct kvm *kvm)
 {
unsigned int pmuver;
@@ -46,38 +56,27 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
 }
 
 /**
- * kvm_pmu_idx_is_64bit - determine if select_idx is a 64bit counter
- * @vcpu: The vcpu pointer
- * @select_idx: The counter index
+ * kvm_pmc_is_64bit - determine if counter is 64bit
+ * @pmc: counter context
  */
-static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
+static bool kvm_pmc_is_64bit(struct kvm_pmc *pmc)
 {
-   return (select_idx == ARMV8_PMU_CYCLE_IDX || kvm_pmu_is_3p5(vcpu));
+   return (pmc->idx == ARMV8_PMU_CYCLE_IDX ||
+   kvm_pmu_is_3p5(kvm_pmc_to_vcpu(pmc)));
 }
 
-static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 
select_idx)
+static bool kvm_pmc_has_64bit_overflow(struct kvm_pmc *pmc)
 {
-   u64 val = __vcpu_sys_reg(vcpu, PMCR_EL0);
+   u64 val = __vcpu_sys_reg(kvm_pmc_to_vcpu(pmc), PMCR_EL0);
 
-   return (select_idx < ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LP)) 
||
-  (select_idx == ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LC));
+   return (pmc->idx < ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LP)) ||
+  (pmc->idx == ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LC));
 }
 
-static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx)
+static bool kvm_pmu_counter_can_chain(struct kvm_pmc *pmc)
 {
-   return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX &&
-   !kvm_pmu_idx_has_64bit_overflow(vcpu, idx));
-}
-
-static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
-{
-   struct kvm_pmu *pmu;
-   struct kvm_vcpu_arch *vcpu_arch;
-
-   pmc -= pmc->idx;
-   pmu = container_of(pmc, struct kvm_pmu, pmc[0]);
-   vcpu_arch = container_of(pmu, struct kvm_vcpu_arch, pmu);
-   return container_of(vcpu_arch, struct kvm_vcpu, arch);
+   return (!(pmc->idx & 1) && (pmc->idx + 1) < ARMV8_PMU_CYCLE_IDX &&
+   !kvm_pmc_has_64bit_overflow(pmc));
 }
 
 static u32 counter_index_to_reg(u64 idx)
@@ -90,21 +89,12 @@ static u32 counter_index_to_evtreg(u64 idx)
return (idx == ARMV8_PMU_CYCLE_IDX) ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + 
idx;
 }
 
-/**
- * kvm_pmu_get_counter_value - get PMU counter value
- * @vcpu: The vcpu pointer
- * @select_idx: The counter index
- */
-u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
+static u64 kvm_pmu_get_pmc_value(struct kvm_pmc *pmc)
 {
+   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
u64 counter, reg, enabled, running;
-   struct kvm_pmu *pmu = >arch.pmu;
-   struct kvm_pmc *pmc = >pmc[select_idx];
-
-   if (!kvm_vcpu_has_pmu(vcpu))
-   return 0;
 
-   reg = counter_index_to_reg(select_idx);
+   reg = counter_index_to_reg(pmc->idx);
counter = __vcpu_sys_reg(vcpu, reg);
 
/*
@@ -115,25 +105,35 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx)
counter += perf_event_read_value(pmc->perf_event, ,
 );
 
-   if (!kvm_pmu_idx_is_64bit(vcpu, select_idx))
+   if (!kvm_pmc_is_64bit(pmc))
counter = lower_32_bits(counter);
 
return counter;
 }
 
-static void kvm_pmu_set_counter(struct kvm_vcpu *vcpu, u64 select_idx, u64 val,
-   bool force)
+/**
+ * kvm_pmu_get_counter_value - get PMU counter value
+ * @vcpu: The vcpu pointer
+ * @select_idx: The counter index
+ */
+u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
 {
-   u64 reg;
-
if (!kvm_vcpu_has_pmu(

[PATCH v4 12/16] KVM: arm64: PMU: Allow ID_DFR0_EL1.PerfMon to be set from userspace

2022-11-13 Thread Marc Zyngier
Allow userspace to write ID_DFR0_EL1, on the condition that only
the PerfMon field can be altered and be something that is compatible
with what was computed for the AArch64 view of the guest.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 57 ++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3cbcda665d23..dc201a0557c0 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1070,6 +1070,19 @@ static u8 vcpu_pmuver(const struct kvm_vcpu *vcpu)
return vcpu->kvm->arch.dfr0_pmuver.unimp;
 }
 
+static u8 perfmon_to_pmuver(u8 perfmon)
+{
+   switch (perfmon) {
+   case ID_DFR0_PERFMON_8_0:
+   return ID_AA64DFR0_EL1_PMUVer_IMP;
+   case ID_DFR0_PERFMON_IMP_DEF:
+   return ID_AA64DFR0_EL1_PMUVer_IMP_DEF;
+   default:
+   /* Anything ARMv8.1+ has the same value. For now. */
+   return perfmon;
+   }
+}
+
 static u8 pmuver_to_perfmon(u8 pmuver)
 {
switch (pmuver) {
@@ -1281,6 +1294,46 @@ static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
return 0;
 }
 
+static int set_id_dfr0_el1(struct kvm_vcpu *vcpu,
+  const struct sys_reg_desc *rd,
+  u64 val)
+{
+   u8 perfmon, host_perfmon;
+   bool valid_pmu;
+
+   host_perfmon = pmuver_to_perfmon(kvm_arm_pmu_get_pmuver_limit());
+
+   /*
+* Allow DFR0_EL1.PerfMon to be set from userspace as long as
+* it doesn't promise more than what the HW gives us on the
+* AArch64 side (as everything is emulated with that), and
+* that this is a PMUv3.
+*/
+   perfmon = FIELD_GET(ARM64_FEATURE_MASK(ID_DFR0_PERFMON), val);
+   if ((perfmon != ID_DFR0_PERFMON_IMP_DEF && perfmon > host_perfmon) ||
+   (perfmon != 0 && perfmon < ID_DFR0_PERFMON_8_0))
+   return -EINVAL;
+
+   valid_pmu = (perfmon != 0 && perfmon != ID_DFR0_PERFMON_IMP_DEF);
+
+   /* Make sure view register and PMU support do match */
+   if (kvm_vcpu_has_pmu(vcpu) != valid_pmu)
+   return -EINVAL;
+
+   /* We can only differ with PerfMon, and anything else is an error */
+   val ^= read_id_reg(vcpu, rd);
+   val &= ~ARM64_FEATURE_MASK(ID_DFR0_PERFMON);
+   if (val)
+   return -EINVAL;
+
+   if (valid_pmu)
+   vcpu->kvm->arch.dfr0_pmuver.imp = perfmon_to_pmuver(perfmon);
+   else
+   vcpu->kvm->arch.dfr0_pmuver.unimp = perfmon_to_pmuver(perfmon);
+
+   return 0;
+}
+
 /*
  * cpufeature ID register user accessors
  *
@@ -1502,7 +1555,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* CRm=1 */
AA32_ID_SANITISED(ID_PFR0_EL1),
AA32_ID_SANITISED(ID_PFR1_EL1),
-   AA32_ID_SANITISED(ID_DFR0_EL1),
+   { SYS_DESC(SYS_ID_DFR0_EL1), .access = access_id_reg,
+ .get_user = get_id_reg, .set_user = set_id_dfr0_el1,
+ .visibility = aa32_id_visibility, },
ID_HIDDEN(ID_AFR0_EL1),
AA32_ID_SANITISED(ID_MMFR0_EL1),
AA32_ID_SANITISED(ID_MMFR1_EL1),
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 11/16] KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace

2022-11-13 Thread Marc Zyngier
Allow userspace to write ID_AA64DFR0_EL1, on the condition that only
the PMUver field can be altered and be at most the one that was
initially computed for the guest.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 42 ++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1d887fe289d8..3cbcda665d23 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1242,6 +1242,45 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
return 0;
 }
 
+static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
+  const struct sys_reg_desc *rd,
+  u64 val)
+{
+   u8 pmuver, host_pmuver;
+   bool valid_pmu;
+
+   host_pmuver = kvm_arm_pmu_get_pmuver_limit();
+
+   /*
+* Allow AA64DFR0_EL1.PMUver to be set from userspace as long
+* as it doesn't promise more than what the HW gives us. We
+* allow an IMPDEF PMU though, only if no PMU is supported
+* (KVM backward compatibility handling).
+*/
+   pmuver = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), val);
+   if ((pmuver != ID_AA64DFR0_EL1_PMUVer_IMP_DEF && pmuver > host_pmuver))
+   return -EINVAL;
+
+   valid_pmu = (pmuver != 0 && pmuver != ID_AA64DFR0_EL1_PMUVer_IMP_DEF);
+
+   /* Make sure view register and PMU support do match */
+   if (kvm_vcpu_has_pmu(vcpu) != valid_pmu)
+   return -EINVAL;
+
+   /* We can only differ with PMUver, and anything else is an error */
+   val ^= read_id_reg(vcpu, rd);
+   val &= ~ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer);
+   if (val)
+   return -EINVAL;
+
+   if (valid_pmu)
+   vcpu->kvm->arch.dfr0_pmuver.imp = pmuver;
+   else
+   vcpu->kvm->arch.dfr0_pmuver.unimp = pmuver;
+
+   return 0;
+}
+
 /*
  * cpufeature ID register user accessors
  *
@@ -1503,7 +1542,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_UNALLOCATED(4,7),
 
/* CRm=5 */
-   ID_SANITISED(ID_AA64DFR0_EL1),
+   { SYS_DESC(SYS_ID_AA64DFR0_EL1), .access = access_id_reg,
+ .get_user = get_id_reg, .set_user = set_id_aa64dfr0_el1, },
ID_SANITISED(ID_AA64DFR1_EL1),
ID_UNALLOCATED(5,2),
ID_UNALLOCATED(5,3),
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 14/16] KVM: arm64: PMU: Allow PMUv3p5 to be exposed to the guest

2022-11-13 Thread Marc Zyngier
Now that the infrastructure is in place, bump the PMU support up
to PMUv3p5.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index c37cc67ff1d7..b7a5f75d008d 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -1057,6 +1057,6 @@ u8 kvm_arm_pmu_get_pmuver_limit(void)
tmp = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
tmp = cpuid_feature_cap_perfmon_field(tmp,
  ID_AA64DFR0_EL1_PMUVer_SHIFT,
- ID_AA64DFR0_EL1_PMUVer_V3P4);
+ ID_AA64DFR0_EL1_PMUVer_V3P5);
return FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), tmp);
 }
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 10/16] KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation

2022-11-13 Thread Marc Zyngier
As further patches will enable the selection of a PMU revision
from userspace, sample the supported PMU revision at VM creation
time, rather than building each time the ID_AA64DFR0_EL1 register
is accessed.

This shouldn't result in any change in behaviour.

Reviewed-by: Reiji Watanabe 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_host.h |  4 
 arch/arm64/kvm/arm.c  |  6 ++
 arch/arm64/kvm/pmu-emul.c | 11 ++
 arch/arm64/kvm/sys_regs.c | 36 ---
 include/kvm/arm_pmu.h |  6 ++
 5 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 45e2136322ba..cc44e3bc528d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -163,6 +163,10 @@ struct kvm_arch {
 
u8 pfr0_csv2;
u8 pfr0_csv3;
+   struct {
+   u8 imp:4;
+   u8 unimp:4;
+   } dfr0_pmuver;
 
/* Hypercall features firmware registers' descriptor */
struct kvm_smccc_features smccc_feat;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..f956aab438c7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -164,6 +164,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
set_default_spectre(kvm);
kvm_arm_init_hypercalls(kvm);
 
+   /*
+* Initialise the default PMUver before there is a chance to
+* create an actual PMU.
+*/
+   kvm->arch.dfr0_pmuver.imp = kvm_arm_pmu_get_pmuver_limit();
+
return ret;
 out_free_stage2_pgd:
kvm_free_stage2_pgd(>arch.mmu);
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 419e5e0a13d0..4320c389fa7f 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -1047,3 +1047,14 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, 
struct kvm_device_attr *attr)
 
return -ENXIO;
 }
+
+u8 kvm_arm_pmu_get_pmuver_limit(void)
+{
+   u64 tmp;
+
+   tmp = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
+   tmp = cpuid_feature_cap_perfmon_field(tmp,
+ ID_AA64DFR0_EL1_PMUVer_SHIFT,
+ ID_AA64DFR0_EL1_PMUVer_V3P4);
+   return FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), tmp);
+}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f4a7c5abcbca..1d887fe289d8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1062,6 +1062,27 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
return true;
 }
 
+static u8 vcpu_pmuver(const struct kvm_vcpu *vcpu)
+{
+   if (kvm_vcpu_has_pmu(vcpu))
+   return vcpu->kvm->arch.dfr0_pmuver.imp;
+
+   return vcpu->kvm->arch.dfr0_pmuver.unimp;
+}
+
+static u8 pmuver_to_perfmon(u8 pmuver)
+{
+   switch (pmuver) {
+   case ID_AA64DFR0_EL1_PMUVer_IMP:
+   return ID_DFR0_PERFMON_8_0;
+   case ID_AA64DFR0_EL1_PMUVer_IMP_DEF:
+   return ID_DFR0_PERFMON_IMP_DEF;
+   default:
+   /* Anything ARMv8.1+ has the same value. For now. */
+   return pmuver;
+   }
+}
+
 /* Read a sanitised cpufeature ID register by sys_reg_desc */
 static u64 read_id_reg(const struct kvm_vcpu *vcpu, struct sys_reg_desc const 
*r)
 {
@@ -,18 +1132,17 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, 
struct sys_reg_desc const *r
/* Limit debug to ARMv8.0 */
val &= ~ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_DebugVer);
val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_DebugVer), 
6);
-   /* Limit guests to PMUv3 for ARMv8.4 */
-   val = cpuid_feature_cap_perfmon_field(val,
- 
ID_AA64DFR0_EL1_PMUVer_SHIFT,
- kvm_vcpu_has_pmu(vcpu) ? 
ID_AA64DFR0_EL1_PMUVer_V3P4 : 0);
+   /* Set PMUver to the required version */
+   val &= ~ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer);
+   val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer),
+ vcpu_pmuver(vcpu));
/* Hide SPE from guests */
val &= ~ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMSVer);
break;
case SYS_ID_DFR0_EL1:
-   /* Limit guests to PMUv3 for ARMv8.4 */
-   val = cpuid_feature_cap_perfmon_field(val,
- ID_DFR0_PERFMON_SHIFT,
- kvm_vcpu_has_pmu(vcpu) ? 
ID_DFR0_PERFMON_8_4 : 0);
+   val &= ~ARM64_FEATURE_MASK(ID_DFR0_PERFMON);
+   val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_DFR0_PERFMON),
+ pmuver_to_perfmon(vcpu_pmu

[PATCH v4 13/16] KVM: arm64: PMU: Implement PMUv3p5 long counter support

2022-11-13 Thread Marc Zyngier
PMUv3p5 (which is mandatory with ARMv8.5) comes with some extra
features:

- All counters are 64bit

- The overflow point is controlled by the PMCR_EL0.LP bit

Add the required checks in the helpers that control counter
width and overflow, as well as the sysreg handling for the LP
bit. A new kvm_pmu_is_3p5() helper makes it easy to spot the
PMUv3p5 specific handling.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 8 +---
 arch/arm64/kvm/sys_regs.c | 4 
 include/kvm/arm_pmu.h | 7 +++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 4320c389fa7f..c37cc67ff1d7 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -52,13 +52,15 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
  */
 static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
 {
-   return (select_idx == ARMV8_PMU_CYCLE_IDX);
+   return (select_idx == ARMV8_PMU_CYCLE_IDX || kvm_pmu_is_3p5(vcpu));
 }
 
 static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 
select_idx)
 {
-   return (select_idx == ARMV8_PMU_CYCLE_IDX &&
-   __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
+   u64 val = __vcpu_sys_reg(vcpu, PMCR_EL0);
+
+   return (select_idx < ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LP)) 
||
+  (select_idx == ARMV8_PMU_CYCLE_IDX && (val & ARMV8_PMU_PMCR_LC));
 }
 
 static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index dc201a0557c0..615cb148e22a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -654,6 +654,8 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *r)
   | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
if (!kvm_supports_32bit_el0())
val |= ARMV8_PMU_PMCR_LC;
+   if (!kvm_pmu_is_3p5(vcpu))
+   val &= ~ARMV8_PMU_PMCR_LP;
__vcpu_sys_reg(vcpu, r->reg) = val;
 }
 
@@ -703,6 +705,8 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
val |= p->regval & ARMV8_PMU_PMCR_MASK;
if (!kvm_supports_32bit_el0())
val |= ARMV8_PMU_PMCR_LC;
+   if (!kvm_pmu_is_3p5(vcpu))
+   val &= ~ARMV8_PMU_PMCR_LP;
__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
kvm_pmu_handle_pmcr(vcpu, val);
kvm_vcpu_pmu_restore_guest(vcpu);
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 812f729c9108..628775334d5e 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -89,6 +89,12 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
vcpu->arch.pmu.events = *kvm_get_pmu_events();  \
} while (0)
 
+/*
+ * Evaluates as true when emulating PMUv3p5, and false otherwise.
+ */
+#define kvm_pmu_is_3p5(vcpu)   \
+   (vcpu->kvm->arch.dfr0_pmuver.imp >= ID_AA64DFR0_EL1_PMUVer_V3P5)
+
 u8 kvm_arm_pmu_get_pmuver_limit(void);
 
 #else
@@ -153,6 +159,7 @@ static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, 
bool pmceid1)
 }
 
 #define kvm_vcpu_has_pmu(vcpu) ({ false; })
+#define kvm_pmu_is_3p5(vcpu)   ({ false; })
 static inline void kvm_pmu_update_vcpu_events(struct kvm_vcpu *vcpu) {}
 static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {}
 static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {}
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 15/16] KVM: arm64: PMU: Simplify vcpu computation on perf overflow notification

2022-11-13 Thread Marc Zyngier
The way we compute the target vcpu on getting an overflow is
a bit odd, as we use the PMC array as an anchor for kvm_pmc_to_vcpu,
while we could directly compute the correct address.

Get rid of the intermediate step and directly compute the target
vcpu.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index b7a5f75d008d..e3d5fe260dcc 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -405,11 +405,8 @@ void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
 static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work)
 {
struct kvm_vcpu *vcpu;
-   struct kvm_pmu *pmu;
-
-   pmu = container_of(work, struct kvm_pmu, overflow_work);
-   vcpu = kvm_pmc_to_vcpu(pmu->pmc);
 
+   vcpu = container_of(work, struct kvm_vcpu, arch.pmu.overflow_work);
kvm_vcpu_kick(vcpu);
 }
 
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 08/16] KVM: arm64: PMU: Simplify setting a counter to a specific value

2022-11-13 Thread Marc Zyngier
kvm_pmu_set_counter_value() is pretty odd, as it tries to update
the counter value while taking into account the value that is
currently held by the running perf counter.

This is not only complicated, this is quite wrong. Nowhere in
the architecture is it said that the counter would be offset
by something that is pending. The counter should be updated
with the value set by SW, and start counting from there if
required.

Remove the odd computation and just assign the provided value
after having released the perf event (which is then restarted).

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index faab0f57a45d..ea0c8411641f 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -23,6 +23,7 @@ static LIST_HEAD(arm_pmus);
 static DEFINE_MUTEX(arm_pmus_lock);
 
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
+static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc);
 
 static u32 kvm_pmu_event_mask(struct kvm *kvm)
 {
@@ -131,8 +132,10 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx, u64 val)
if (!kvm_vcpu_has_pmu(vcpu))
return;
 
+   kvm_pmu_release_perf_event(>arch.pmu.pmc[select_idx]);
+
reg = counter_index_to_reg(select_idx);
-   __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, 
select_idx);
+   __vcpu_sys_reg(vcpu, reg) = val;
 
/* Recreate the perf event to reflect the updated sample_period */
kvm_pmu_create_perf_event(vcpu, select_idx);
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 04/16] KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow

2022-11-13 Thread Marc Zyngier
The PMU architecture makes a subtle difference between a 64bit
counter and a counter that has a 64bit overflow. This is for example
the case of the cycle counter, which can generate an overflow on
a 32bit boundary if PMCR_EL0.LC==0 despite the accumulation being
done on 64 bits.

Use this distinction in the few cases where it matters in the code,
as we will reuse this with PMUv3p5 long counters.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 43 ---
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 69b67ab3c4bf..d050143326b5 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -50,6 +50,11 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm)
  * @select_idx: The counter index
  */
 static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx)
+{
+   return (select_idx == ARMV8_PMU_CYCLE_IDX);
+}
+
+static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 
select_idx)
 {
return (select_idx == ARMV8_PMU_CYCLE_IDX &&
__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
@@ -57,7 +62,8 @@ static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 
select_idx)
 
 static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx)
 {
-   return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX);
+   return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX &&
+   !kvm_pmu_idx_has_64bit_overflow(vcpu, idx));
 }
 
 static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
@@ -97,7 +103,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx)
counter += perf_event_read_value(pmc->perf_event, ,
 );
 
-   if (select_idx != ARMV8_PMU_CYCLE_IDX)
+   if (!kvm_pmu_idx_is_64bit(vcpu, select_idx))
counter = lower_32_bits(counter);
 
return counter;
@@ -423,6 +429,23 @@ static void kvm_pmu_counter_increment(struct kvm_vcpu 
*vcpu,
}
 }
 
+/* Compute the sample period for a given counter value */
+static u64 compute_period(struct kvm_vcpu *vcpu, u64 select_idx, u64 counter)
+{
+   u64 val;
+
+   if (kvm_pmu_idx_is_64bit(vcpu, select_idx)) {
+   if (!kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx))
+   val = -(counter & GENMASK(31, 0));
+   else
+   val = (-counter) & GENMASK(63, 0);
+   } else {
+   val = (-counter) & GENMASK(31, 0);
+   }
+
+   return val;
+}
+
 /**
  * When the perf event overflows, set the overflow status and inform the vcpu.
  */
@@ -442,10 +465,7 @@ static void kvm_pmu_perf_overflow(struct perf_event 
*perf_event,
 * Reset the sample period to the architectural limit,
 * i.e. the point where the counter overflows.
 */
-   period = -(local64_read(_event->count));
-
-   if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx))
-   period &= GENMASK(31, 0);
+   period = compute_period(vcpu, idx, local64_read(_event->count));
 
local64_set(_event->hw.period_left, 0);
perf_event->attr.sample_period = period;
@@ -571,14 +591,13 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
*vcpu, u64 select_idx)
 
/*
 * If counting with a 64bit counter, advertise it to the perf
-* code, carefully dealing with the initial sample period.
+* code, carefully dealing with the initial sample period
+* which also depends on the overflow.
 */
-   if (kvm_pmu_idx_is_64bit(vcpu, select_idx)) {
+   if (kvm_pmu_idx_is_64bit(vcpu, select_idx))
attr.config1 |= PERF_ATTR_CFG1_COUNTER_64BIT;
-   attr.sample_period = (-counter) & GENMASK(63, 0);
-   } else {
-   attr.sample_period = (-counter) & GENMASK(31, 0);
-   }
+
+   attr.sample_period = compute_period(vcpu, select_idx, counter);
 
event = perf_event_create_kernel_counter(, -1, current,
 kvm_pmu_perf_overflow, pmc);
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 06/16] KVM: arm64: PMU: Only narrow counters that are not 64bit wide

2022-11-13 Thread Marc Zyngier
The current PMU emulation sometimes narrows counters to 32bit
if the counter isn't the cycle counter. As this is going to
change with PMUv3p5 where the counters are all 64bit, fix
the couple of cases where this happens unconditionally.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 9e6bc7edc4de..1fab889dbc74 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -151,20 +151,17 @@ static void kvm_pmu_release_perf_event(struct kvm_pmc 
*pmc)
  */
 static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc)
 {
-   u64 counter, reg, val;
+   u64 reg, val;
 
if (!pmc->perf_event)
return;
 
-   counter = kvm_pmu_get_counter_value(vcpu, pmc->idx);
+   val = kvm_pmu_get_counter_value(vcpu, pmc->idx);
 
-   if (pmc->idx == ARMV8_PMU_CYCLE_IDX) {
+   if (pmc->idx == ARMV8_PMU_CYCLE_IDX)
reg = PMCCNTR_EL0;
-   val = counter;
-   } else {
+   else
reg = PMEVCNTR0_EL0 + pmc->idx;
-   val = lower_32_bits(counter);
-   }
 
__vcpu_sys_reg(vcpu, reg) = val;
 
@@ -414,7 +411,8 @@ static void kvm_pmu_counter_increment(struct kvm_vcpu *vcpu,
 
/* Increment this counter */
reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
-   reg = lower_32_bits(reg);
+   if (!kvm_pmu_idx_is_64bit(vcpu, i))
+   reg = lower_32_bits(reg);
__vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
 
/* No overflow? move on */
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 07/16] KVM: arm64: PMU: Add counter_index_to_*reg() helpers

2022-11-13 Thread Marc Zyngier
In order to reduce the boilerplate code, add two helpers returning
the counter register index (resp. the event register) in the vcpu
register file from the counter index.

Reviewed-by: Oliver Upton 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 1fab889dbc74..faab0f57a45d 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -77,6 +77,16 @@ static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
return container_of(vcpu_arch, struct kvm_vcpu, arch);
 }
 
+static u32 counter_index_to_reg(u64 idx)
+{
+   return (idx == ARMV8_PMU_CYCLE_IDX) ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + idx;
+}
+
+static u32 counter_index_to_evtreg(u64 idx)
+{
+   return (idx == ARMV8_PMU_CYCLE_IDX) ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + 
idx;
+}
+
 /**
  * kvm_pmu_get_counter_value - get PMU counter value
  * @vcpu: The vcpu pointer
@@ -91,8 +101,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx)
if (!kvm_vcpu_has_pmu(vcpu))
return 0;
 
-   reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
-   ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
+   reg = counter_index_to_reg(select_idx);
counter = __vcpu_sys_reg(vcpu, reg);
 
/*
@@ -122,8 +131,7 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx, u64 val)
if (!kvm_vcpu_has_pmu(vcpu))
return;
 
-   reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
- ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
+   reg = counter_index_to_reg(select_idx);
__vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, 
select_idx);
 
/* Recreate the perf event to reflect the updated sample_period */
@@ -158,10 +166,7 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
struct kvm_pmc *pmc)
 
val = kvm_pmu_get_counter_value(vcpu, pmc->idx);
 
-   if (pmc->idx == ARMV8_PMU_CYCLE_IDX)
-   reg = PMCCNTR_EL0;
-   else
-   reg = PMEVCNTR0_EL0 + pmc->idx;
+   reg = counter_index_to_reg(pmc->idx);
 
__vcpu_sys_reg(vcpu, reg) = val;
 
@@ -404,16 +409,16 @@ static void kvm_pmu_counter_increment(struct kvm_vcpu 
*vcpu,
u64 type, reg;
 
/* Filter on event type */
-   type = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i);
+   type = __vcpu_sys_reg(vcpu, counter_index_to_evtreg(i));
type &= kvm_pmu_event_mask(vcpu->kvm);
if (type != event)
continue;
 
/* Increment this counter */
-   reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
+   reg = __vcpu_sys_reg(vcpu, counter_index_to_reg(i)) + 1;
if (!kvm_pmu_idx_is_64bit(vcpu, i))
reg = lower_32_bits(reg);
-   __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
+   __vcpu_sys_reg(vcpu, counter_index_to_reg(i)) = reg;
 
/* No overflow? move on */
if (kvm_pmu_idx_has_64bit_overflow(vcpu, i) ? reg : 
lower_32_bits(reg))
@@ -549,8 +554,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu 
*vcpu, u64 select_idx)
struct perf_event_attr attr;
u64 eventsel, counter, reg, data;
 
-   reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
- ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + pmc->idx;
+   reg = counter_index_to_evtreg(select_idx);
data = __vcpu_sys_reg(vcpu, reg);
 
kvm_pmu_stop_counter(vcpu, pmc);
@@ -632,8 +636,7 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
mask &= ~ARMV8_PMU_EVTYPE_EVENT;
mask |= kvm_pmu_event_mask(vcpu->kvm);
 
-   reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
- ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx;
+   reg = counter_index_to_evtreg(select_idx);
 
__vcpu_sys_reg(vcpu, reg) = data & mask;
 
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 00/16] KVM: arm64: PMU: Fixing chained events, and PMUv3p5 support

2022-11-13 Thread Marc Zyngier
Ricardo reported[0] that our PMU emulation was busted when it comes to
chained events, as we cannot expose the overflow on a 32bit boundary
(which the architecture requires).

This series aims at fixing this (by deleting a lot of code), and as a
bonus adds support for PMUv3p5, as this requires us to fix a few more
things.

Tested on A53 (PMUv3) and QEMU (PMUv3p5).

* From v3 [3]:
  - Independent tracking of unimplemented and implemented revisions
  - Simplified counter enable/disable
  - Simplified the vcpu address computation on overflow notification
  - Added one patch to move from vcpu+index to pmc
  - Rebased on 6.1-rc3

* From v2 [2]:
  - Some tightening of userspace access to ID_{AA64,}DFR0_EL1

* From v1 [1]:
  - Rebased on 6.1-rc2
  - New patch advertising that we always support the CHAIN event
  - Plenty of bug fixes (idreg handling, AArch32, overflow narrowing)
  - Tons of cleanups
  - All kudos to Oliver and Reiji for spending the time to review this
mess, and Ricardo for finding more bugs!

[0] https://lore.kernel.org/r/20220805004139.990531-1-ricar...@google.com
[1] https://lore.kernel.org/r/20220805135813.2102034-1-...@kernel.org
[2] https://lore.kernel.org/r/20221028105402.2030192-1-...@kernel.org
[3] https://lore.kernel.org/r/20221107085435.2581641-1-...@kernel.org

Marc Zyngier (16):
  arm64: Add ID_DFR0_EL1.PerfMon values for PMUv3p7 and IMP_DEF
  KVM: arm64: PMU: Align chained counter implementation with
architecture pseudocode
  KVM: arm64: PMU: Always advertise the CHAIN event
  KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
  KVM: arm64: PMU: Narrow the overflow checking when required
  KVM: arm64: PMU: Only narrow counters that are not 64bit wide
  KVM: arm64: PMU: Add counter_index_to_*reg() helpers
  KVM: arm64: PMU: Simplify setting a counter to a specific value
  KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits
  KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation
  KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace
  KVM: arm64: PMU: Allow ID_DFR0_EL1.PerfMon to be set from userspace
  KVM: arm64: PMU: Implement PMUv3p5 long counter support
  KVM: arm64: PMU: Allow PMUv3p5 to be exposed to the guest
  KVM: arm64: PMU: Simplify vcpu computation on perf overflow
notification
  KVM: arm64: PMU: Make kvm_pmc the main data structure

 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/include/asm/sysreg.h   |   2 +
 arch/arm64/kvm/arm.c  |   6 +
 arch/arm64/kvm/pmu-emul.c | 475 --
 arch/arm64/kvm/sys_regs.c | 139 -
 include/kvm/arm_pmu.h |  15 +-
 6 files changed, 345 insertions(+), 296 deletions(-)

-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 01/16] arm64: Add ID_DFR0_EL1.PerfMon values for PMUv3p7 and IMP_DEF

2022-11-13 Thread Marc Zyngier
Align the ID_DFR0_EL1.PerfMon values with ID_AA64DFR0_EL1.PMUver.

Reviewed-by: Oliver Upton 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/sysreg.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..84f59ce1dc6d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -698,6 +698,8 @@
 #define ID_DFR0_PERFMON_8_10x4
 #define ID_DFR0_PERFMON_8_40x5
 #define ID_DFR0_PERFMON_8_50x6
+#define ID_DFR0_PERFMON_8_70x7
+#define ID_DFR0_PERFMON_IMP_DEF0xf
 
 #define ID_ISAR4_SWP_FRAC_SHIFT28
 #define ID_ISAR4_PSR_M_SHIFT   24
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 02/16] KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode

2022-11-13 Thread Marc Zyngier
Ricardo recently pointed out that the PMU chained counter emulation
in KVM wasn't quite behaving like the one on actual hardware, in
the sense that a chained counter would expose an overflow on
both halves of a chained counter, while KVM would only expose the
overflow on the top half.

The difference is subtle, but significant. What does the architecture
say (DDI0087 H.a):

- Up to PMUv3p4, all counters but the cycle counter are 32bit

- A 32bit counter that overflows generates a CHAIN event on the
  adjacent counter after exposing its own overflow status

- The CHAIN event is accounted if the counter is correctly
  configured (CHAIN event selected and counter enabled)

This all means that our current implementation (which uses 64bit
perf events) prevents us from emulating this overflow on the lower half.

How to fix this? By implementing the above, to the letter.

This largly results in code deletion, removing the notions of
"counter pair", "chained counters", and "canonical counter".
The code is further restructured to make the CHAIN handling similar
to SWINC, as the two are now extremely similar in behaviour.

Reported-by: Ricardo Koller 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 320 ++
 include/kvm/arm_pmu.h |   2 -
 2 files changed, 86 insertions(+), 236 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 0003c7d37533..57765be69bea 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -15,16 +15,14 @@
 #include 
 #include 
 
+#define PERF_ATTR_CFG1_COUNTER_64BIT   BIT(0)
+
 DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
 
 static LIST_HEAD(arm_pmus);
 static DEFINE_MUTEX(arm_pmus_lock);
 
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
-static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 select_idx);
-static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc);
-
-#define PERF_ATTR_CFG1_KVM_PMU_CHAINED 0x1
 
 static u32 kvm_pmu_event_mask(struct kvm *kvm)
 {
@@ -57,6 +55,11 @@ static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 
select_idx)
__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC);
 }
 
+static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx)
+{
+   return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX);
+}
+
 static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
 {
struct kvm_pmu *pmu;
@@ -69,91 +72,22 @@ static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
 }
 
 /**
- * kvm_pmu_pmc_is_chained - determine if the pmc is chained
- * @pmc: The PMU counter pointer
- */
-static bool kvm_pmu_pmc_is_chained(struct kvm_pmc *pmc)
-{
-   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
-
-   return test_bit(pmc->idx >> 1, vcpu->arch.pmu.chained);
-}
-
-/**
- * kvm_pmu_idx_is_high_counter - determine if select_idx is a high/low counter
- * @select_idx: The counter index
- */
-static bool kvm_pmu_idx_is_high_counter(u64 select_idx)
-{
-   return select_idx & 0x1;
-}
-
-/**
- * kvm_pmu_get_canonical_pmc - obtain the canonical pmc
- * @pmc: The PMU counter pointer
- *
- * When a pair of PMCs are chained together we use the low counter (canonical)
- * to hold the underlying perf event.
- */
-static struct kvm_pmc *kvm_pmu_get_canonical_pmc(struct kvm_pmc *pmc)
-{
-   if (kvm_pmu_pmc_is_chained(pmc) &&
-   kvm_pmu_idx_is_high_counter(pmc->idx))
-   return pmc - 1;
-
-   return pmc;
-}
-static struct kvm_pmc *kvm_pmu_get_alternate_pmc(struct kvm_pmc *pmc)
-{
-   if (kvm_pmu_idx_is_high_counter(pmc->idx))
-   return pmc - 1;
-   else
-   return pmc + 1;
-}
-
-/**
- * kvm_pmu_idx_has_chain_evtype - determine if the event type is chain
+ * kvm_pmu_get_counter_value - get PMU counter value
  * @vcpu: The vcpu pointer
  * @select_idx: The counter index
  */
-static bool kvm_pmu_idx_has_chain_evtype(struct kvm_vcpu *vcpu, u64 select_idx)
-{
-   u64 eventsel, reg;
-
-   select_idx |= 0x1;
-
-   if (select_idx == ARMV8_PMU_CYCLE_IDX)
-   return false;
-
-   reg = PMEVTYPER0_EL0 + select_idx;
-   eventsel = __vcpu_sys_reg(vcpu, reg) & kvm_pmu_event_mask(vcpu->kvm);
-
-   return eventsel == ARMV8_PMUV3_PERFCTR_CHAIN;
-}
-
-/**
- * kvm_pmu_get_pair_counter_value - get PMU counter value
- * @vcpu: The vcpu pointer
- * @pmc: The PMU counter pointer
- */
-static u64 kvm_pmu_get_pair_counter_value(struct kvm_vcpu *vcpu,
- struct kvm_pmc *pmc)
+u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
 {
-   u64 counter, counter_high, reg, enabled, running;
-
-   if (kvm_pmu_pmc_is_chained(pmc)) {
-   pmc = kvm_pmu_get_canonical_pmc(pmc);
-   reg = PMEVCNTR0_EL0 + pmc->idx;
+   u64 counter, reg, enab

[PATCH v4 05/16] KVM: arm64: PMU: Narrow the overflow checking when required

2022-11-13 Thread Marc Zyngier
For 64bit counters that overflow on a 32bit boundary, make
sure we only check the bottom 32bit to generate a CHAIN event.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index d050143326b5..9e6bc7edc4de 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -417,7 +417,8 @@ static void kvm_pmu_counter_increment(struct kvm_vcpu *vcpu,
reg = lower_32_bits(reg);
__vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
 
-   if (reg) /* No overflow? move on */
+   /* No overflow? move on */
+   if (kvm_pmu_idx_has_64bit_overflow(vcpu, i) ? reg : 
lower_32_bits(reg))
continue;
 
/* Mark overflow */
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 03/16] KVM: arm64: PMU: Always advertise the CHAIN event

2022-11-13 Thread Marc Zyngier
Even when the underlying HW doesn't offer the CHAIN event
(which happens with QEMU), we can always support it as we're
in control of the counter overflow.

Always advertise the event via PMCEID0_EL0.

Reviewed-by: Reiji Watanabe 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 57765be69bea..69b67ab3c4bf 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -701,6 +701,8 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
 
if (!pmceid1) {
val = read_sysreg(pmceid0_el0);
+   /* always support CHAIN */
+   val |= BIT(ARMV8_PMUV3_PERFCTR_CHAIN);
base = 0;
} else {
val = read_sysreg(pmceid1_el0);
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 09/16] KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits

2022-11-13 Thread Marc Zyngier
Even when using PMUv3p5 (which implies 64bit counters), there is
no way for AArch32 to write to the top 32 bits of the counters.
The only way to influence these bits (other than by counting
events) is by writing PMCR.P==1.

Make sure we obey the architecture and preserve the top 32 bits
on a counter update.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/pmu-emul.c | 35 +++
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index ea0c8411641f..419e5e0a13d0 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -119,13 +119,8 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx)
return counter;
 }
 
-/**
- * kvm_pmu_set_counter_value - set PMU counter value
- * @vcpu: The vcpu pointer
- * @select_idx: The counter index
- * @val: The counter value
- */
-void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val)
+static void kvm_pmu_set_counter(struct kvm_vcpu *vcpu, u64 select_idx, u64 val,
+   bool force)
 {
u64 reg;
 
@@ -135,12 +130,36 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx, u64 val)
kvm_pmu_release_perf_event(>arch.pmu.pmc[select_idx]);
 
reg = counter_index_to_reg(select_idx);
+
+   if (vcpu_mode_is_32bit(vcpu) && select_idx != ARMV8_PMU_CYCLE_IDX &&
+   !force) {
+   /*
+* Even with PMUv3p5, AArch32 cannot write to the top
+* 32bit of the counters. The only possible course of
+* action is to use PMCR.P, which will reset them to
+* 0 (the only use of the 'force' parameter).
+*/
+   val  = lower_32_bits(val);
+   val |= upper_32_bits(__vcpu_sys_reg(vcpu, reg));
+   }
+
__vcpu_sys_reg(vcpu, reg) = val;
 
/* Recreate the perf event to reflect the updated sample_period */
kvm_pmu_create_perf_event(vcpu, select_idx);
 }
 
+/**
+ * kvm_pmu_set_counter_value - set PMU counter value
+ * @vcpu: The vcpu pointer
+ * @select_idx: The counter index
+ * @val: The counter value
+ */
+void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val)
+{
+   kvm_pmu_set_counter(vcpu, select_idx, val, false);
+}
+
 /**
  * kvm_pmu_release_perf_event - remove the perf event
  * @pmc: The PMU counter pointer
@@ -533,7 +552,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
mask &= ~BIT(ARMV8_PMU_CYCLE_IDX);
for_each_set_bit(i, , 32)
-   kvm_pmu_set_counter_value(vcpu, i, 0);
+   kvm_pmu_set_counter(vcpu, i, 0, true);
}
 }
 
-- 
2.34.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 11/14] KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace

2022-11-13 Thread Marc Zyngier

On 2022-11-08 05:36, Reiji Watanabe wrote:

Hi Marc,


> BTW, if we have no intention of supporting a mix of vCPUs with and
> without PMU, I think it would be nice if we have a clear comment on
> that in the code.  Or I'm hoping to disallow it if possible though.

I'm not sure we're in a position to do this right now. The current API
has always (for good or bad reasons) been per-vcpu as it is tied to
the vcpu initialisation.


Thank you for your comments!
Then, when a guest that has a mix of vCPUs with and without PMU,
userspace can set kvm->arch.dfr0_pmuver to zero or IMPDEF, and the
PMUVER for vCPUs with PMU will become 0 or IMPDEF as I mentioned.
For instance, on the host whose PMUVER==1, if vCPU#0 has no 
PMU(PMUVER==0),
vCPU#1 has PMU(PMUVER==1), if the guest is migrated to another host 
with
same CPU features (PMUVER==1), if SET_ONE_REG of ID_AA64DFR0_EL1 for 
vCPU#0

is done after for vCPU#1, kvm->arch.dfr0_pmuver will be set to 0, and
the guest will see PMUVER==0 even for vCPU1.

Should we be concerned about this case?


Yeah, this is a real problem. The issue is that we want to keep
track of two separate bits of information:

- what is the revision of the PMU when the PMU is supported?
- what is the PMU unsupported or IMPDEF?

and we use the same field for both, which clearly cannot work
if we allow vcpus with and without PMUs in the same VM.

I've now switched to an implementation where I track both
the architected version as well as the version exposed when
no PMU is supported, see below.

We still cannot track both no-PMU *and* impdef-PMU, nor can we
track multiple PMU revisions. But that's not a thing as far as
I am concerned.

Thanks,

M.

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h

index 90c9a2dd3f26..cc44e3bc528d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -163,7 +163,10 @@ struct kvm_arch {

u8 pfr0_csv2;
u8 pfr0_csv3;
-   u8 dfr0_pmuver;
+   struct {
+   u8 imp:4;
+   u8 unimp:4;
+   } dfr0_pmuver;

/* Hypercall features firmware registers' descriptor */
struct kvm_smccc_features smccc_feat;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6b3ed524630d..f956aab438c7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -168,7 +168,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long 
type)

 * Initialise the default PMUver before there is a chance to
 * create an actual PMU.
 */
-   kvm->arch.dfr0_pmuver = kvm_arm_pmu_get_pmuver_limit();
+   kvm->arch.dfr0_pmuver.imp = kvm_arm_pmu_get_pmuver_limit();

return ret;
 out_free_stage2_pgd:
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 95100896de72..615cb148e22a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1069,14 +1069,9 @@ static bool access_arch_timer(struct kvm_vcpu 
*vcpu,

 static u8 vcpu_pmuver(const struct kvm_vcpu *vcpu)
 {
if (kvm_vcpu_has_pmu(vcpu))
-   return vcpu->kvm->arch.dfr0_pmuver;
+   return vcpu->kvm->arch.dfr0_pmuver.imp;

-   /* Special case for IMPDEF PMUs that KVM has exposed in the past... */
-   if (vcpu->kvm->arch.dfr0_pmuver == ID_AA64DFR0_EL1_PMUVer_IMP_DEF)
-   return ID_AA64DFR0_EL1_PMUVer_IMP_DEF;
-
-   /* The real "no PMU" */
-   return 0;
+   return vcpu->kvm->arch.dfr0_pmuver.unimp;
 }

 static u8 perfmon_to_pmuver(u8 perfmon)
@@ -1295,7 +1290,10 @@ static int set_id_aa64dfr0_el1(struct kvm_vcpu 
*vcpu,

if (val)
return -EINVAL;

-   vcpu->kvm->arch.dfr0_pmuver = pmuver;
+   if (valid_pmu)
+   vcpu->kvm->arch.dfr0_pmuver.imp = pmuver;
+   else
+   vcpu->kvm->arch.dfr0_pmuver.unimp = pmuver;

return 0;
 }
@@ -1332,7 +1330,10 @@ static int set_id_dfr0_el1(struct kvm_vcpu *vcpu,
if (val)
return -EINVAL;

-   vcpu->kvm->arch.dfr0_pmuver = perfmon_to_pmuver(perfmon);
+   if (valid_pmu)
+   vcpu->kvm->arch.dfr0_pmuver.imp = perfmon_to_pmuver(perfmon);
+   else
+   vcpu->kvm->arch.dfr0_pmuver.unimp = perfmon_to_pmuver(perfmon);

return 0;
 }
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 3d526df9f3c5..628775334d5e 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -93,7 +93,7 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
  * Evaluates as true when emulating PMUv3p5, and false otherwise.
  */
 #define kvm_pmu_is_3p5(vcpu)   \
-   (vcpu->kvm->arch.dfr0_pmuver >= ID_AA64DFR0_EL1_PMUVer_V3P5)
+   (vcpu->kvm->arch.dfr0_pmuver.imp >= ID_AA64DFR0_EL1_PMUVer_V3P5)

 u8 kvm_arm_pmu_get_pmuver_limit(void);

--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu

Re: [PATCH v3 11/14] KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace

2022-11-13 Thread Marc Zyngier

On 2022-11-08 05:38, Reiji Watanabe wrote:

Hi Marc,

On Mon, Nov 7, 2022 at 1:16 AM Marc Zyngier  wrote:


Allow userspace to write ID_AA64DFR0_EL1, on the condition that only
the PMUver field can be altered and be at most the one that was
initially computed for the guest.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 40 
++-

 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7a4cd644b9c0..47c882401f3c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1247,6 +1247,43 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu 
*vcpu,

return 0;
 }

+static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
+  const struct sys_reg_desc *rd,
+  u64 val)
+{
+   u8 pmuver, host_pmuver;
+   bool valid_pmu;
+
+   host_pmuver = kvm_arm_pmu_get_pmuver_limit();
+
+   /*
+* Allow AA64DFR0_EL1.PMUver to be set from userspace as long
+* as it doesn't promise more than what the HW gives us. We
+* allow an IMPDEF PMU though, only if no PMU is supported
+* (KVM backward compatibility handling).
+*/
+   pmuver = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), 
val);
+   if ((pmuver != ID_AA64DFR0_EL1_PMUVer_IMP_DEF && pmuver > 
host_pmuver) ||

+   (pmuver != 0 && pmuver < ID_AA64DFR0_EL1_PMUVer_IMP))


Nit: Since this second condition cannot be true (right?), perhaps it 
might

be rather confusing?  I wasn't able to understand what it meant until
I see the equivalent check in set_id_dfr0_el1() (Maybe just me 
though:).


Ah, that's just me being tainted with the AArch32 version which
doesn't start at 1 for PMUv3. I'll drop it.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 02/14] KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode

2022-11-12 Thread Marc Zyngier
Hi Reiji,

On Sat, 12 Nov 2022 07:55:38 +,
Reiji Watanabe  wrote:
> 
> Hi Marc,
> 
> On Mon, Nov 7, 2022 at 12:54 AM Marc Zyngier  wrote:
> >
> > Ricardo recently pointed out that the PMU chained counter emulation
> > in KVM wasn't quite behaving like the one on actual hardware, in
> > the sense that a chained counter would expose an overflow on
> > both halves of a chained counter, while KVM would only expose the
> > overflow on the top half.
> >
> > The difference is subtle, but significant. What does the architecture
> > say (DDI0087 H.a):
> >
> > - Up to PMUv3p4, all counters but the cycle counter are 32bit
> >
> > - A 32bit counter that overflows generates a CHAIN event on the
> >   adjacent counter after exposing its own overflow status
> >
> > - The CHAIN event is accounted if the counter is correctly
> >   configured (CHAIN event selected and counter enabled)
> >
> > This all means that our current implementation (which uses 64bit
> > perf events) prevents us from emulating this overflow on the lower half.
> >
> > How to fix this? By implementing the above, to the letter.
> >
> > This largly results in code deletion, removing the notions of
> > "counter pair", "chained counters", and "canonical counter".
> > The code is further restructured to make the CHAIN handling similar
> > to SWINC, as the two are now extremely similar in behaviour.
> >
> > Reported-by: Ricardo Koller 
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/pmu-emul.c | 312 ++
> >  include/kvm/arm_pmu.h |   2 -
> >  2 files changed, 83 insertions(+), 231 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > index 0003c7d37533..a38b3127f649 100644
> > --- a/arch/arm64/kvm/pmu-emul.c
> > +++ b/arch/arm64/kvm/pmu-emul.c
> > @@ -15,16 +15,14 @@
> >  #include 
> >  #include 
> >
> > +#define PERF_ATTR_CFG1_COUNTER_64BIT   BIT(0)
> 
> Although this isn't the new code (but just a name change),
> wouldn't it be nicer to have armv8pmu_event_is_64bit()
> (in arch/arm64/kernel/perf_event.c) use the macro as well ?

We tried that in the past, and the amount of churn wasn't really worth
it. I'm happy to revisit this in the future, but probably as a
separate patch.

[...]

> > @@ -163,29 +97,7 @@ static u64 kvm_pmu_get_pair_counter_value(struct 
> > kvm_vcpu *vcpu,
> > counter += perf_event_read_value(pmc->perf_event, ,
> >  );
> >
> > -   return counter;
> > -}
> > -
> > -/**
> > - * kvm_pmu_get_counter_value - get PMU counter value
> > - * @vcpu: The vcpu pointer
> > - * @select_idx: The counter index
> > - */
> > -u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
> > -{
> > -   u64 counter;
> > -   struct kvm_pmu *pmu = >arch.pmu;
> > -   struct kvm_pmc *pmc = >pmc[select_idx];
> > -
> > -   if (!kvm_vcpu_has_pmu(vcpu))
> > -   return 0;
> > -
> > -   counter = kvm_pmu_get_pair_counter_value(vcpu, pmc);
> > -
> > -   if (kvm_pmu_pmc_is_chained(pmc) &&
> > -   kvm_pmu_idx_is_high_counter(select_idx))
> > -   counter = upper_32_bits(counter);
> > -   else if (select_idx != ARMV8_PMU_CYCLE_IDX)
> > +   if (select_idx != ARMV8_PMU_CYCLE_IDX)
> 
> Nit:Using 'pmc->idx' instead of 'select_idx' appears to be more consistent.

Well, this is the exact opposite of Oliver's comment last time. I
initially used pmc->idx, but it made the diff somehow larger and also
more difficult to understand what changed.

In the end, I'd rather rework the whole file to consistently use
vcpu+idx or pmc, as the mixed use of both is annoying. And that's
probably a cleanup patch for later.

[...]

> > @@ -340,11 +245,8 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu 
> > *vcpu, u64 val)
> >
> > pmc = >pmc[i];
> >
> > -   /* A change in the enable state may affect the chain state 
> > */
> > -   kvm_pmu_update_pmc_chained(vcpu, i);
> > kvm_pmu_create_perf_event(vcpu, i);
> >
> > -   /* At this point, pmc must be the canonical */
> > if (pmc->perf_event) {
> > perf_event_enable(pmc->perf_event);
> > if (pmc->perf_event->state != 
> > PERF_EVENT_STATE_ACTIVE)
> > @@ -375,11 +277,8

Re: [PATCH v10 3/7] KVM: Support dirty ring in conjunction with bitmap

2022-11-11 Thread Marc Zyngier

On 2022-11-11 22:19, Gavin Shan wrote:

Hi Marc,

On 11/11/22 11:19 PM, Marc Zyngier wrote:

On Thu, 10 Nov 2022 23:47:41 +,
Gavin Shan  wrote:


commit b05377ecbe003f12c8b79846fa3a300401dcab68 (HEAD -> 
kvm/arm64_dirtyring)

Author: Gavin Shan 
Date:   Fri Nov 11 07:13:12 2022 +0800

 KVM: Push dirty information unconditionally to backup bitmap
 In mark_page_dirty_in_slot(), we bail out when no running 
vcpu

exists and
 a running vcpu context is strictly required by architecture. It 
may cause
 backwards compatible issue. Currently, saving vgic/its tables is 
the only
 case where no running vcpu context is required. We may have 
other unknown
 cases where no running vcpu context exists and it's reported by 
the warning
 message. For this, the application is going to enable the backup 
bitmap for
 the unknown cases. However, the dirty information can't be 
pushed to the
 backup bitmap even though the backup bitmap has been enabled, 
until the
 unknown cases are added to the allowed list of non-running vcpu 
context

 with extra code changes to the host kernel.
 In order to make the new application, where the backup 
bitmap

has been
 enabled, to work with the unchanged host, we continue to push 
the dirty

 information to the backup bitmap instead of bailing out early.
 Suggested-by: Sean Christopherson 
 Signed-off-by: Gavin Shan 

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2719e10dd37d..03e6a38094c1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3308,8 +3308,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
 return;
  -   if
(WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) &&
!vcpu))
-   return;
+   WARN_ON_ONCE(!vcpu && 
!kvm_arch_allow_write_without_running_vcpu(kvm));


I'm happy with this.



Thanks, it's the primary change in this patch.


  #endif
  if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
@@ -3318,7 +3317,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
  if (kvm->dirty_ring_size && vcpu)
 kvm_dirty_ring_push(vcpu, slot, rel_gfn);
-   else
+   else if (memslot->dirty_bitmap)
 set_bit_le(rel_gfn, memslot->dirty_bitmap);


But that I don't get. Or rather, I don't get the commit message that
matches this hunk. Do we want to catch the case where all of the
following are true:

- we don't have a vcpu,
- we're allowed to log non-vcpu dirtying
- we *only* have the ring?

If so, can we please capture that in the commit message?



Nice catch! This particular case needs to be warned explicitly. Without
the patch, kernel crash is triggered. With this patch applied, the 
error

or warning is dropped silently. We either check memslot->dirty_bitmap
in mark_page_dirty_in_slot(), or check it in
kvm_arch_allow_write_without_running_vcpu().
I personally the later one. Let me post a formal patch on top of your
'next' branch where the commit log will be improved accordingly.


I personally prefer this memslot->dirty_bitmap, as this is
a completely legal case (the VMM may not want to track the
ITS dirtying).

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Update comment of create_hyp_mappings

2022-11-11 Thread Marc Zyngier
On Thu, 03 Nov 2022 05:32:28 +,
Wei-Lin Chang  wrote:
> 
> HYP_PAGE_OFFSET is removed since 4.8, and the method for generating Hyp
> VAs has evolved. Update the functional description of
> create_hyp_mappings accordingly.
> 
> Signed-off-by: Wei-Lin Chang 
> ---
>  arch/arm64/kvm/mmu.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c9a13e487..a9ae4a3f9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -424,9 +424,10 @@ void kvm_unshare_hyp(void *from, void *to)
>   * @to:  The virtual kernel end address of the range (exclusive)
>   * @prot:The protection to be applied to this range
>   *
> - * The same virtual address as the kernel virtual address is also used
> - * in Hyp-mode mapping (modulo HYP_PAGE_OFFSET) to the same underlying
> - * physical pages.
> + * The Hyp virtual address is generated by masking the kernel VA with
> + * va_mask then inserting tag_val for the higher bits starting from
> + * tag_lsb. See kvm_compute_layout() in va_layout.c for more info.
> + * Both Hyp VA and kernel VA ranges map to the same underlying physical 
> pages.

My problem with this comment is that neither va_mask, tag_val, nor
tag_lsb mean anything in this context. All this is purely internal to
kvm_compute_layout(), and is unnecessary here.

I'd rather you have something along the lines of:

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..6d04818a1a5b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -460,7 +460,7 @@ void kvm_unshare_hyp(void *from, void *to)
  * @prot:  The protection to be applied to this range
  *
  * The same virtual address as the kernel virtual address is also used
- * in Hyp-mode mapping (modulo HYP_PAGE_OFFSET) to the same underlying
+ * in Hyp-mode mapping (modulo a random offset) to the same underlying
  * physical pages.
  */
 int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)

Whoever is interested in understanding the generation of the offset
can follow kern_hyp_va().

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


  1   2   3   4   5   6   7   8   9   10   >