Re: [PATCH v3 6/7] KVM: arm64: Participate in bitmap-based PTE aging

2024-04-02 Thread Marc Zyngier
On Tue, 02 Apr 2024 05:06:56 +0100,
Yu Zhao  wrote:
> 
> On Mon, Apr 1, 2024 at 7:30 PM James Houghton  wrote:
> >
> > Participate in bitmap-based aging while grabbing the KVM MMU lock for
> > reading. Ideally we wouldn't need to grab this lock at all, but that
> > would require a more intrustive and risky change.
>^^ intrusive
> This sounds subjective -- I'd just present the challenges and let
> reviewers make their own judgements.

Quite the opposite.

This sort of comment actually indicates that the author has at least
understood some of the complexity behind the proposed changes. It is a
qualitative comment that conveys useful information to reviewers, and
even more to the maintainers of this code.

That's the difference between a human developer and a bot, and I'm not
overly fond of bots.

M.

-- 
Without deviation from the norm, progress is not possible.



Re: [PATCH v3 6/7] KVM: arm64: Participate in bitmap-based PTE aging

2024-04-02 Thread Oliver Upton
On Tue, Apr 02, 2024 at 12:06:56AM -0400, Yu Zhao wrote:
> On Mon, Apr 1, 2024 at 7:30 PM James Houghton  wrote:
> > Suggested-by: Yu Zhao 
> 
> Thanks but I did not suggest this.

Entirely up to you, but I would still want to credit everyone who
contributed to a feature even if the underlying implementation has
changed since the original attempt.

> What I have in v2 is RCU based. I hope Oliver or someone else can help
> make that work.

Why? If there's data to show that RCU has a material benefit over taking
the MMU lock for read then I'm all for it. Otherwise, the work required
to support page-table walkers entirely outside of the MMU lock isn't
justified.

In addition to ensuring that page table teardown is always RCU-safe,
we'd need to make sure all of the walkers that take the write lock are
prepared to handle races.

> Otherwise we can just drop this for now and revisit
> later.

I really wouldn't get hung up on the locking as the make-or-break for
whether arm64 supports this MMU notifier.

-- 
Thanks,
Oliver



Re: [PATCH v3 6/7] KVM: arm64: Participate in bitmap-based PTE aging

2024-04-01 Thread Yu Zhao
On Mon, Apr 1, 2024 at 7:30 PM James Houghton  wrote:
>
> Participate in bitmap-based aging while grabbing the KVM MMU lock for
> reading. Ideally we wouldn't need to grab this lock at all, but that
> would require a more intrustive and risky change.
   ^^ intrusive
This sounds subjective -- I'd just present the challenges and let
reviewers make their own judgements.

> Also pass
> KVM_PGTABLE_WALK_SHARED, as this software walker is safe to run in
> parallel with other walkers.
>
> It is safe only to grab the KVM MMU lock for reading as the kvm_pgtable
> is destroyed while holding the lock for writing, and freeing of the page
> table pages is either done while holding the MMU lock for writing or
> after an RCU grace period.
>
> When mkold == false, record the young pages in the passed-in bitmap.
>
> When mkold == true, only age the pages that need aging according to the
> passed-in bitmap.
>
> Suggested-by: Yu Zhao 

Thanks but I did not suggest this.

What I have in v2 is RCU based. I hope Oliver or someone else can help
make that work. Otherwise we can just drop this for now and revisit
later.

(I have no problems with this patch if the Arm folks think the
RCU-based version doesn't have a good ROI.)



[PATCH v3 6/7] KVM: arm64: Participate in bitmap-based PTE aging

2024-04-01 Thread James Houghton
Participate in bitmap-based aging while grabbing the KVM MMU lock for
reading. Ideally we wouldn't need to grab this lock at all, but that
would require a more intrustive and risky change. Also pass
KVM_PGTABLE_WALK_SHARED, as this software walker is safe to run in
parallel with other walkers.

It is safe only to grab the KVM MMU lock for reading as the kvm_pgtable
is destroyed while holding the lock for writing, and freeing of the page
table pages is either done while holding the MMU lock for writing or
after an RCU grace period.

When mkold == false, record the young pages in the passed-in bitmap.

When mkold == true, only age the pages that need aging according to the
passed-in bitmap.

Suggested-by: Yu Zhao 
Signed-off-by: James Houghton 
---
 arch/arm64/include/asm/kvm_host.h|  5 +
 arch/arm64/include/asm/kvm_pgtable.h |  4 +++-
 arch/arm64/kvm/hyp/pgtable.c | 21 ++---
 arch/arm64/kvm/mmu.c | 23 +--
 4 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 9e8a496fb284..e503553cb356 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1331,4 +1331,9 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
(get_idreg_field((kvm), id, fld) >= expand_field_sign(id, fld, min) && \
 get_idreg_field((kvm), id, fld) <= expand_field_sign(id, fld, max))
 
+#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age
+bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn);
+#define kvm_arch_finish_bitmap_age kvm_arch_finish_bitmap_age
+void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index 19278dfe7978..1976b4e26188 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -644,6 +644,7 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable 
*pgt, u64 addr);
  * @addr:  Intermediate physical address to identify the page-table entry.
  * @size:  Size of the address range to visit.
  * @mkold: True if the access flag should be cleared.
+ * @range: The kvm_gfn_range that is being used for the memslot walker.
  *
  * The offset of @addr within a page is ignored.
  *
@@ -657,7 +658,8 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable 
*pgt, u64 addr);
  * Return: True if any of the visited PTEs had the access flag set.
  */
 bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
-u64 size, bool mkold);
+u64 size, bool mkold,
+struct kvm_gfn_range *range);
 
 /**
  * kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3fae5830f8d2..e881d3595aca 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1281,6 +1281,7 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable 
*pgt, u64 addr)
 }
 
 struct stage2_age_data {
+   struct kvm_gfn_range *range;
boolmkold;
boolyoung;
 };
@@ -1290,20 +1291,24 @@ static int stage2_age_walker(const struct 
kvm_pgtable_visit_ctx *ctx,
 {
kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
struct stage2_age_data *data = ctx->arg;
+   gfn_t gfn = ctx->addr / PAGE_SIZE;
 
if (!kvm_pte_valid(ctx->old) || new == ctx->old)
return 0;
 
data->young = true;
 
+
/*
-* stage2_age_walker() is always called while holding the MMU lock for
-* write, so this will always succeed. Nonetheless, this deliberately
-* follows the race detection pattern of the other stage-2 walkers in
-* case the locking mechanics of the MMU notifiers is ever changed.
+* stage2_age_walker() may not be holding the MMU lock for write, so
+* follow the race detection pattern of the other stage-2 walkers.
 */
-   if (data->mkold && !stage2_try_set_pte(ctx, new))
-   return -EAGAIN;
+   if (data->mkold) {
+   if (kvm_gfn_should_age(data->range, gfn) &&
+   !stage2_try_set_pte(ctx, new))
+   return -EAGAIN;
+   } else
+   kvm_gfn_record_young(data->range, gfn);
 
/*
 * "But where's the TLBI?!", you scream.
@@ -1315,10 +1320,12 @@ static int stage2_age_walker(const struct 
kvm_pgtable_visit_ctx *ctx,
 }
 
 bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
-u64 size, bool mkold)
+u64 size, bool mkold,
+struct kvm_gfn_range *range)
 {
struct stage2_age_data