From: Paul Mackerras <pau...@ozlabs.org>

[ Upstream commit 11362b1befeadaae4d159a8cddcdaf6b8afe08f9 ]

There is a potential race condition between hypervisor page faults
and flushing a memslot.  It is possible for a page fault to read the
memslot before a memslot is updated and then write a PTE to the
partition-scoped page tables after kvmppc_radix_flush_memslot has
completed.  (Note that this race has never been explicitly observed.)

To close this race, it is sufficient to increment the MMU sequence
number while the kvm->mmu_lock is held.  That will cause
mmu_notifier_retry() to return true, and the page fault will then
return to the guest without inserting a PTE.

Signed-off-by: Paul Mackerras <pau...@ozlabs.org>
Signed-off-by: Sasha Levin <sas...@kernel.org>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index da8375437d161..9d73448354698 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1104,6 +1104,11 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
                                         kvm->arch.lpid);
                gpa += PAGE_SIZE;
        }
+       /*
+        * Increase the mmu notifier sequence number to prevent any page
+        * fault that read the memslot earlier from writing a PTE.
+        */
+       kvm->mmu_notifier_seq++;
        spin_unlock(&kvm->mmu_lock);
 }
 
-- 
2.25.1

Reply via email to