Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-25 Thread Peter Zijlstra
On Wed, May 24, 2023 at 02:39:50PM -0700, Sean Christopherson wrote:
> On Wed, May 24, 2023, Peter Zijlstra wrote:
> > On Wed, May 24, 2023 at 01:16:03PM -0700, Sean Christopherson wrote:
> > > Of course, the only accesses outside of mmu_lock are reads, so on x86 that
> > > "atomic" access is just a READ_ONCE() load, but that's not the case for 
> > > all
> > > architectures.
> > 
> > This is true on *all* archs. atomic_set() and atomic_read() are no more
> > and no less than WRITE_ONCE() / READ_ONCE().
> 
> Ah, I take it s390's handcoded assembly routines are just a paranoid 
> equivalents
> and not truly special?  "l" and "st" do sound quite generic...

Yep, compiler *should* generate the same with READ_ONCE/WRITE_ONCE.



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-24 Thread Kautuk Consul
On 2023-05-24 22:33:36, Peter Zijlstra wrote:
> On Wed, May 24, 2023 at 01:16:03PM -0700, Sean Christopherson wrote:
> 
> > Atomics aren't memory barriers on all architectures, e.g. see the various
> > definitions of smp_mb__after_atomic().
> > 
> > Even if atomic operations did provide barriers, using an atomic would be 
> > overkill
> > and a net negative.  On strongly ordered architectures like x86, memory 
> > barriers are
> > just compiler barriers, whereas atomics may be more expensive. 
> 
> Not quite, smp_{r,w}mb() and smp_mb__{before,after}_atomic() are
> compiler barriers on the TSO archs, but smp_mb() very much isn't. TSO
> still allows stores to be delayed vs later loads (iow it doesn't pretend
> to hide the store buffer).
> 
> > Of course, the only
> > accesses outside of mmu_lock are reads, so on x86 that "atomic" access is 
> > just a
> > READ_ONCE() load, but that's not the case for all architectures.
> 
> This is true on *all* archs. atomic_set() and atomic_read() are no more
> and no less than WRITE_ONCE() / READ_ONCE().
> 
> > Anyways, the point is that atomics and memory barriers are different things 
> > that
> > serve different purposes.
> 
> This is true; esp. on the weakly ordered architectures where atomics do
> not naturally imply any ordering.

Thanks for the information, everyone.



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-24 Thread Sean Christopherson
On Wed, May 24, 2023, Peter Zijlstra wrote:
> On Wed, May 24, 2023 at 01:16:03PM -0700, Sean Christopherson wrote:
> > Of course, the only accesses outside of mmu_lock are reads, so on x86 that
> > "atomic" access is just a READ_ONCE() load, but that's not the case for all
> > architectures.
> 
> This is true on *all* archs. atomic_set() and atomic_read() are no more
> and no less than WRITE_ONCE() / READ_ONCE().

Ah, I take it s390's handcoded assembly routines are just a paranoid equivalents
and not truly special?  "l" and "st" do sound quite generic...

  commit 7657e41a0bd16c9d8b3cefe8fd5d6ac3c25ae4bf
  Author: Heiko Carstens 
  Date:   Thu Feb 17 13:13:58 2011 +0100

[S390] atomic: use inline asm

Use inline assemblies for atomic_read/set(). This way there shouldn't
be any questions or subtle volatile semantics left.

static inline int __atomic_read(const atomic_t *v)
{
int c;

asm volatile(
"   l   %0,%1\n"
: "=d" (c) : "R" (v->counter));
return c;
}

static inline void __atomic_set(atomic_t *v, int i)
{
asm volatile(
"   st  %1,%0\n"
: "=R" (v->counter) : "d" (i));
}



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-24 Thread Peter Zijlstra
On Wed, May 24, 2023 at 01:16:03PM -0700, Sean Christopherson wrote:

> Atomics aren't memory barriers on all architectures, e.g. see the various
> definitions of smp_mb__after_atomic().
> 
> Even if atomic operations did provide barriers, using an atomic would be 
> overkill
> and a net negative.  On strongly ordered architectures like x86, memory 
> barriers are
> just compiler barriers, whereas atomics may be more expensive. 

Not quite, smp_{r,w}mb() and smp_mb__{before,after}_atomic() are
compiler barriers on the TSO archs, but smp_mb() very much isn't. TSO
still allows stores to be delayed vs later loads (iow it doesn't pretend
to hide the store buffer).

> Of course, the only
> accesses outside of mmu_lock are reads, so on x86 that "atomic" access is 
> just a
> READ_ONCE() load, but that's not the case for all architectures.

This is true on *all* archs. atomic_set() and atomic_read() are no more
and no less than WRITE_ONCE() / READ_ONCE().

> Anyways, the point is that atomics and memory barriers are different things 
> that
> serve different purposes.

This is true; esp. on the weakly ordered architectures where atomics do
not naturally imply any ordering.



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-24 Thread Peter Zijlstra
On Wed, May 24, 2023 at 11:42:15AM +0530, Kautuk Consul wrote:

> My comment was based on the assumption that "all atomic operations are
> implicit memory barriers". If that assumption is true then we won't need

It is not -- also see Documentation/atomic_t.txt.

Specifically atomic_read() doesn't imply any ordering on any
architecture including the strongly ordered TSO-archs (like x86).



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-24 Thread Sean Christopherson
On Wed, May 24, 2023, Kautuk Consul wrote:
> On 2023-05-23 07:19:43, Sean Christopherson wrote:
> > On Tue, May 23, 2023, Kautuk Consul wrote:
> > > On 2022-07-06 16:20:10, Chao Peng wrote:
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index e9153b54e2a4..c262ebb168a7 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -765,10 +765,10 @@ struct kvm {
> > > >  
> > > >  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > > > struct mmu_notifier mmu_notifier;
> > > > -   unsigned long mmu_notifier_seq;
> > > > -   long mmu_notifier_count;
> > > > -   gfn_t mmu_notifier_range_start;
> > > > -   gfn_t mmu_notifier_range_end;
> > > > +   unsigned long mmu_updating_seq;
> > > > +   long mmu_updating_count;
> > > 
> > > Can we convert mmu_updating_seq and mmu_updating_count to atomic_t ?
> > 
> > Heh, can we?  Yes.  Should we?  No.
> > 
> > > I see that not all accesses to these are under the kvm->mmu_lock
> > > spinlock.
> > 
> > Ya, working as intended.  Ignoring gfn_to_pfn_cache for the moment, all 
> > accesses
> > to mmu_invalidate_in_progress (was mmu_notifier_count / mmu_updating_count 
> > above)
> > are done under mmu_lock.  And for for mmu_notifier_seq (mmu_updating_seq 
> > above),
> > all writes and some reads are done under mmu_lock.  The only reads that are 
> > done
> > outside of mmu_lock are the initial snapshots of the sequence number.
> > 
> > gfn_to_pfn_cache uses a different locking scheme, the comments in
> > mmu_notifier_retry_cache() do a good job explaining the ordering.
> > 
> > > This will also remove the need for putting separate smp_wmb() and
> > > smp_rmb() memory barriers while accessing these structure members.
> > 
> > No, the memory barriers aren't there to provide any kind of atomicity.  The 
> > barriers
> > exist to ensure that stores and loads to/from the sequence and invalidate 
> > in-progress
> > counts are ordered relative to the invalidation (stores to counts) and 
> > creation (loads)
> > of SPTEs.  Making the counts atomic changes nothing because atomic 
> > operations don't
> > guarantee the necessary ordering.
> I'm not saying that the memory barriers provide atomicity.
> My comment was based on the assumption that "all atomic operations are
> implicit memory barriers". If that assumption is true then we won't need
> the memory barriers here if we use atomic operations for protecting
> these 2 structure members.

Atomics aren't memory barriers on all architectures, e.g. see the various
definitions of smp_mb__after_atomic().

Even if atomic operations did provide barriers, using an atomic would be 
overkill
and a net negative.  On strongly ordered architectures like x86, memory 
barriers are
just compiler barriers, whereas atomics may be more expensive.  Of course, the 
only
accesses outside of mmu_lock are reads, so on x86 that "atomic" access is just a
READ_ONCE() load, but that's not the case for all architectures.

Anyways, the point is that atomics and memory barriers are different things that
serve different purposes.



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-24 Thread Kautuk Consul
On 2023-05-23 07:19:43, Sean Christopherson wrote:
> On Tue, May 23, 2023, Kautuk Consul wrote:
> > On 2022-07-06 16:20:10, Chao Peng wrote:
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index e9153b54e2a4..c262ebb168a7 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -765,10 +765,10 @@ struct kvm {
> > >  
> > >  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > >   struct mmu_notifier mmu_notifier;
> > > - unsigned long mmu_notifier_seq;
> > > - long mmu_notifier_count;
> > > - gfn_t mmu_notifier_range_start;
> > > - gfn_t mmu_notifier_range_end;
> > > + unsigned long mmu_updating_seq;
> > > + long mmu_updating_count;
> > 
> > Can we convert mmu_updating_seq and mmu_updating_count to atomic_t ?
> 
> Heh, can we?  Yes.  Should we?  No.
> 
> > I see that not all accesses to these are under the kvm->mmu_lock
> > spinlock.
> 
> Ya, working as intended.  Ignoring gfn_to_pfn_cache for the moment, all 
> accesses
> to mmu_invalidate_in_progress (was mmu_notifier_count / mmu_updating_count 
> above)
> are done under mmu_lock.  And for for mmu_notifier_seq (mmu_updating_seq 
> above),
> all writes and some reads are done under mmu_lock.  The only reads that are 
> done
> outside of mmu_lock are the initial snapshots of the sequence number.
> 
> gfn_to_pfn_cache uses a different locking scheme, the comments in
> mmu_notifier_retry_cache() do a good job explaining the ordering.
> 
> > This will also remove the need for putting separate smp_wmb() and
> > smp_rmb() memory barriers while accessing these structure members.
> 
> No, the memory barriers aren't there to provide any kind of atomicity.  The 
> barriers
> exist to ensure that stores and loads to/from the sequence and invalidate 
> in-progress
> counts are ordered relative to the invalidation (stores to counts) and 
> creation (loads)
> of SPTEs.  Making the counts atomic changes nothing because atomic operations 
> don't
> guarantee the necessary ordering.
I'm not saying that the memory barriers provide atomicity.
My comment was based on the assumption that "all atomic operations are
implicit memory barriers". If that assumption is true then we won't need
the memory barriers here if we use atomic operations for protecting
these 2 structure members.
> 
> E.g. when handling a page fault, KVM snapshots the sequence outside of 
> mmu_lock
> _before_ touching any state that is involved in resolving the host pfn, e.g. 
> primary
> MMU state (VMAs, host page tables, etc.).   After the page fault task acquires
> mmu_lock, KVM checks that there are no in-progress invalidations and that the 
> sequence
> count is the same.  This ensures that if there is a concurrent page fault and
> invalidation event, the page fault task will either acquire mmu_lock and 
> create SPTEs
> _before_ the invalidation is processed, or the page fault task will observe 
> either an
> elevated mmu_invalidate_in_progress or a different sequence count, and thus 
> retry the
> page fault, if the page fault task acquires mmu_lock after the invalidation 
> event.



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-23 Thread Sean Christopherson
On Tue, May 23, 2023, Kautuk Consul wrote:
> On 2022-07-06 16:20:10, Chao Peng wrote:
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index e9153b54e2a4..c262ebb168a7 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -765,10 +765,10 @@ struct kvm {
> >  
> >  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> > struct mmu_notifier mmu_notifier;
> > -   unsigned long mmu_notifier_seq;
> > -   long mmu_notifier_count;
> > -   gfn_t mmu_notifier_range_start;
> > -   gfn_t mmu_notifier_range_end;
> > +   unsigned long mmu_updating_seq;
> > +   long mmu_updating_count;
> 
> Can we convert mmu_updating_seq and mmu_updating_count to atomic_t ?

Heh, can we?  Yes.  Should we?  No.

> I see that not all accesses to these are under the kvm->mmu_lock
> spinlock.

Ya, working as intended.  Ignoring gfn_to_pfn_cache for the moment, all accesses
to mmu_invalidate_in_progress (was mmu_notifier_count / mmu_updating_count 
above)
are done under mmu_lock.  And for for mmu_notifier_seq (mmu_updating_seq above),
all writes and some reads are done under mmu_lock.  The only reads that are done
outside of mmu_lock are the initial snapshots of the sequence number.

gfn_to_pfn_cache uses a different locking scheme, the comments in
mmu_notifier_retry_cache() do a good job explaining the ordering.

> This will also remove the need for putting separate smp_wmb() and
> smp_rmb() memory barriers while accessing these structure members.

No, the memory barriers aren't there to provide any kind of atomicity.  The 
barriers
exist to ensure that stores and loads to/from the sequence and invalidate 
in-progress
counts are ordered relative to the invalidation (stores to counts) and creation 
(loads)
of SPTEs.  Making the counts atomic changes nothing because atomic operations 
don't
guarantee the necessary ordering.

E.g. when handling a page fault, KVM snapshots the sequence outside of mmu_lock
_before_ touching any state that is involved in resolving the host pfn, e.g. 
primary
MMU state (VMAs, host page tables, etc.).   After the page fault task acquires
mmu_lock, KVM checks that there are no in-progress invalidations and that the 
sequence
count is the same.  This ensures that if there is a concurrent page fault and
invalidation event, the page fault task will either acquire mmu_lock and create 
SPTEs
_before_ the invalidation is processed, or the page fault task will observe 
either an
elevated mmu_invalidate_in_progress or a different sequence count, and thus 
retry the
page fault, if the page fault task acquires mmu_lock after the invalidation 
event.



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2023-05-23 Thread Kautuk Consul
On 2022-07-06 16:20:10, Chao Peng wrote:
> The sync mechanism between mmu_notifier and page fault handler employs
> fields mmu_notifier_seq/count and mmu_notifier_range_start/end. For the
> to be added private memory, there is the same mechanism needed but not
> rely on mmu_notifier (It uses new introduced memfile_notifier). This
> patch renames the existing fields and related helper functions to a
> neutral name mmu_updating_* so private memory can reuse.
> 
> No functional change intended.
> 
> Signed-off-by: Chao Peng 
> ---
>  arch/arm64/kvm/mmu.c |  8 ++---
>  arch/mips/kvm/mmu.c  | 10 +++---
>  arch/powerpc/include/asm/kvm_book3s_64.h |  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_host.c|  4 +--
>  arch/powerpc/kvm/book3s_64_mmu_hv.c  |  4 +--
>  arch/powerpc/kvm/book3s_64_mmu_radix.c   |  6 ++--
>  arch/powerpc/kvm/book3s_hv_nested.c  |  2 +-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  8 ++---
>  arch/powerpc/kvm/e500_mmu_host.c |  4 +--
>  arch/riscv/kvm/mmu.c |  4 +--
>  arch/x86/kvm/mmu/mmu.c   | 14 
>  arch/x86/kvm/mmu/paging_tmpl.h   |  4 +--
>  include/linux/kvm_host.h | 38 ++---
>  virt/kvm/kvm_main.c  | 42 +++-
>  virt/kvm/pfncache.c  | 14 
>  15 files changed, 81 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 87f1cd0df36e..7ee6fafc24ee 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -993,7 +993,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct 
> kvm_memory_slot *memslot,
>* THP doesn't start to split while we are adjusting the
>* refcounts.
>*
> -  * We are sure this doesn't happen, because mmu_notifier_retry
> +  * We are sure this doesn't happen, because mmu_updating_retry
>* was successful and we are holding the mmu_lock, so if this
>* THP is trying to split, it will be blocked in the mmu
>* notifier before touching any of the pages, specifically
> @@ -1188,9 +1188,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   return ret;
>   }
>  
> - mmu_seq = vcpu->kvm->mmu_notifier_seq;
> + mmu_seq = vcpu->kvm->mmu_updating_seq;
>   /*
> -  * Ensure the read of mmu_notifier_seq happens before we call
> +  * Ensure the read of mmu_updating_seq happens before we call
>* gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
>* the page we just got a reference to gets unmapped before we have a
>* chance to grab the mmu_lock, which ensure that if the page gets
> @@ -1246,7 +1246,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   else
>   write_lock(>mmu_lock);
>   pgt = vcpu->arch.hw_mmu->pgt;
> - if (mmu_notifier_retry(kvm, mmu_seq))
> + if (mmu_updating_retry(kvm, mmu_seq))
>   goto out_unlock;
>  
>   /*
> diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
> index 1bfd1b501d82..abd468c6a749 100644
> --- a/arch/mips/kvm/mmu.c
> +++ b/arch/mips/kvm/mmu.c
> @@ -615,17 +615,17 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
> unsigned long gpa,
>* Used to check for invalidations in progress, of the pfn that is
>* returned by pfn_to_pfn_prot below.
>*/
> - mmu_seq = kvm->mmu_notifier_seq;
> + mmu_seq = kvm->mmu_updating_seq;
>   /*
> -  * Ensure the read of mmu_notifier_seq isn't reordered with PTE reads in
> +  * Ensure the read of mmu_updating_seq isn't reordered with PTE reads in
>* gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
>* risk the page we get a reference to getting unmapped before we have a
> -  * chance to grab the mmu_lock without mmu_notifier_retry() noticing.
> +  * chance to grab the mmu_lock without mmu_updating_retry () noticing.
>*
>* This smp_rmb() pairs with the effective smp_wmb() of the combination
>* of the pte_unmap_unlock() after the PTE is zapped, and the
>* spin_lock() in kvm_mmu_notifier_invalidate_() before
> -  * mmu_notifier_seq is incremented.
> +  * mmu_updating_seq is incremented.
>*/
>   smp_rmb();
>  
> @@ -638,7 +638,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
> unsigned long gpa,
>  
>   spin_lock(>mmu_lock);
>   /* Check if an invalidation has taken place since we got pfn */
> - if (mmu_notifier_retry(kvm, mmu_seq)) {
> + if (mmu_updating_retry(kvm, mmu_seq)) {
>   /*
>* This can happen when mappings are changed asynchronously, but
>* also synchronously if a COW is triggered by
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> 

Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2022-08-10 Thread Chao Peng
On Fri, Aug 05, 2022 at 09:54:35PM +0200, Paolo Bonzini wrote:
> On 7/29/22 21:02, Sean Christopherson wrote:
> > If we really want a different name, I'd vote for nomenclature that captures 
> > the
> > invalidation aspect, which is really what the variables are all trackng, 
> > e.g.
> > 
> >mmu_invalidate_seq
> >mmu_invalidate_in_progress
> >mmu_invalidate_range_start
> >mmu_invalidate_range_end
> > 
> 
> Agreed, and this can of course be committed separately if Chao Peng sends it
> outside this series.

I will do that, probably also includes:
  06/14 KVM: Rename KVM_PRIVATE_MEM_SLOT

Chao
> 
> Paolo



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2022-08-05 Thread Paolo Bonzini

On 7/29/22 21:02, Sean Christopherson wrote:

If we really want a different name, I'd vote for nomenclature that captures the
invalidation aspect, which is really what the variables are all trackng, e.g.

   mmu_invalidate_seq
   mmu_invalidate_in_progress
   mmu_invalidate_range_start
   mmu_invalidate_range_end



Agreed, and this can of course be committed separately if Chao Peng 
sends it outside this series.


Paolo



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2022-08-03 Thread Chao Peng
On Fri, Jul 29, 2022 at 07:02:12PM +, Sean Christopherson wrote:
> On Wed, Jul 06, 2022, Chao Peng wrote:
> > The sync mechanism between mmu_notifier and page fault handler employs
> > fields mmu_notifier_seq/count and mmu_notifier_range_start/end. For the
> > to be added private memory, there is the same mechanism needed but not
> > rely on mmu_notifier (It uses new introduced memfile_notifier). This
> > patch renames the existing fields and related helper functions to a
> > neutral name mmu_updating_* so private memory can reuse.
> 
> mmu_updating_* is too broad of a term, e.g. page faults and many other 
> operations
> also update the mmu.  Although the name most definitely came from the 
> mmu_notifier,
> it's not completely inaccurate for other sources, e.g. KVM's MMU is still 
> being
> notified of something, even if the source is not the actual mmu_notifier.
> 
> If we really want a different name, I'd vote for nomenclature that captures 
> the
> invalidation aspect, which is really what the variables are all trackng, e.g.
> 
>   mmu_invalidate_seq
>   mmu_invalidate_in_progress
>   mmu_invalidate_range_start
>   mmu_invalidate_range_end

Looks good to me. Thanks.

Chao



Re: [PATCH v7 08/14] KVM: Rename mmu_notifier_*

2022-07-29 Thread Sean Christopherson
On Wed, Jul 06, 2022, Chao Peng wrote:
> The sync mechanism between mmu_notifier and page fault handler employs
> fields mmu_notifier_seq/count and mmu_notifier_range_start/end. For the
> to be added private memory, there is the same mechanism needed but not
> rely on mmu_notifier (It uses new introduced memfile_notifier). This
> patch renames the existing fields and related helper functions to a
> neutral name mmu_updating_* so private memory can reuse.

mmu_updating_* is too broad of a term, e.g. page faults and many other 
operations
also update the mmu.  Although the name most definitely came from the 
mmu_notifier,
it's not completely inaccurate for other sources, e.g. KVM's MMU is still being
notified of something, even if the source is not the actual mmu_notifier.

If we really want a different name, I'd vote for nomenclature that captures the
invalidation aspect, which is really what the variables are all trackng, e.g.

  mmu_invalidate_seq
  mmu_invalidate_in_progress
  mmu_invalidate_range_start
  mmu_invalidate_range_end