RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-06 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Tuesday, August 06, 2013 6:42 AM
 To: Wood Scott-B07421
 Cc: Benjamin Herrenschmidt; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, August 06, 2013 12:49 AM
  To: Bhushan Bharat-R65777
  Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm-
  p...@vger.kernel.org; k...@vger.kernel.org;
  linuxppc-dev@lists.ozlabs.org
  Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup
  like booke3s
 
  On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
  
-Original Message-
From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
Sent: Saturday, August 03, 2013 9:54 AM
To: Bhushan Bharat-R65777
Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
lookup like booke3s
   
On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
 One of the problem I saw was that if I put this code in
 asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and
 other friend function (on which this code depends) are defined in
 pgtable.h.
 And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
 before it defines pte_present() and friends functions.

 Ok I move wove this in asm/pgtable*.h, initially I fought with
 myself to take this code in pgtable* but finally end up doing
 here (got biased by book3s :)).
   
Is there a reason why these routines can not be completely generic
in pgtable.h ?
  
   How about the generic function:
  
   diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
   b/arch/powerpc/include/asm/pgtable-ppc64.h
   index d257d98..21daf28 100644
   --- a/arch/powerpc/include/asm/pgtable-ppc64.h
   +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
   @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct
   mm_struct
  *mm,
   return old;
}
  
   +static inline unsigned long pte_read(pte_t *p) { #ifdef
   +PTE_ATOMIC_UPDATES
   +   pte_t pte;
   +   pte_t tmp;
   +   __asm__ __volatile__ (
   +   1: ldarx   %0,0,%3\n
   +  andi.   %1,%0,%4\n
   +  bne-1b\n
   +  ori %1,%0,%4\n
   +  stdcx.  %1,0,%3\n
   +  bne-1b
   +   : =r (pte), =r (tmp), =m (*p)
   +   : r (p), i (_PAGE_BUSY)
   +   : cc);
   +
   +   return pte;
   +#else
   +   return pte_val(*p);
   +#endif
   +#endif
   +}
static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
 unsigned long addr,
   pte_t *ptep)
 
  Please leave a blank line between functions.
 
{
   diff --git a/arch/powerpc/include/asm/pgtable.h
   b/arch/powerpc/include/asm/pgtable.h
   index 690c8c2..dad712c 100644
   --- a/arch/powerpc/include/asm/pgtable.h
   +++ b/arch/powerpc/include/asm/pgtable.h
   @@ -254,6 +254,45 @@ static inline pte_t
   *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }
   #endif
   /* !CONFIG_HUGETLB_PAGE */
  
   +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
   +int writing, unsigned long
   +*pte_sizep)
 
  The name implies that it just reads the PTE.  Setting accessed/dirty
  shouldn't be an undocumented side-effect.
 
 Ok, will rename and document.
 
  Why can't the caller do that (or a different function that the caller
  calls afterward if desired)?
 
 The current implementation in book3s is;
  1) find a pte/hugepte
  2) return null if pte not present
  3) take _PAGE_BUSY lock
  4) set accessed/dirty
  5) clear _PAGE_BUSY.
 
 What I tried was
 1) find a pte/hugepte
 2) return null if pte not present
 3) return pte (not take lock by not setting _PAGE_BUSY)
 
 4) then user calls  __ptep_set_access_flags() to atomic update the
 dirty/accessed flags in pte.
 
 - but the benchmark results were not good
 - Also can there be race as we do not take lock in step 3 and update in step 
 4 ?
 
 
  Though even then you have the undocumented side effect of locking the
  PTE on certain targets.
 
   +{
   +   pte_t *ptep;
   +   pte_t pte;
   +   unsigned long ps = *pte_sizep;
   +   unsigned int shift;
   +
   +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
   +   if (!ptep)
   +   return __pte(0);
   +   if (shift)
   +   *pte_sizep = 1ul  shift;
   +   else
   +   *pte_sizep = PAGE_SIZE;
   +
   +   if (ps  *pte_sizep)
   +   return __pte(0);
   +
   +   if (!pte_present(*ptep))
   +   return __pte(0);
   +
   +#ifdef CONFIG_PPC64
   +   /* Lock

RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-06 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, August 06, 2013 12:49 AM
 To: Bhushan Bharat-R65777
 Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm-
 p...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
   Sent: Saturday, August 03, 2013 9:54 AM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
   k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
   Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
   lookup like booke3s
  
   On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
One of the problem I saw was that if I put this code in
asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
friend function (on which this code depends) are defined in pgtable.h.
And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
before it defines pte_present() and friends functions.
   
Ok I move wove this in asm/pgtable*.h, initially I fought with
myself to take this code in pgtable* but finally end up doing here
(got biased by book3s :)).
  
   Is there a reason why these routines can not be completely generic
   in pgtable.h ?
 
  How about the generic function:
 
  diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
  b/arch/powerpc/include/asm/pgtable-ppc64.h
  index d257d98..21daf28 100644
  --- a/arch/powerpc/include/asm/pgtable-ppc64.h
  +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
  @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct
 *mm,
  return old;
   }
 
  +static inline unsigned long pte_read(pte_t *p) { #ifdef
  +PTE_ATOMIC_UPDATES
  +   pte_t pte;
  +   pte_t tmp;
  +   __asm__ __volatile__ (
  +   1: ldarx   %0,0,%3\n
  +  andi.   %1,%0,%4\n
  +  bne-1b\n
  +  ori %1,%0,%4\n
  +  stdcx.  %1,0,%3\n
  +  bne-1b
  +   : =r (pte), =r (tmp), =m (*p)
  +   : r (p), i (_PAGE_BUSY)
  +   : cc);
  +
  +   return pte;
  +#else
  +   return pte_val(*p);
  +#endif
  +#endif
  +}
   static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
unsigned long addr,
  pte_t *ptep)
 
 Please leave a blank line between functions.
 
   {
  diff --git a/arch/powerpc/include/asm/pgtable.h
  b/arch/powerpc/include/asm/pgtable.h
  index 690c8c2..dad712c 100644
  --- a/arch/powerpc/include/asm/pgtable.h
  +++ b/arch/powerpc/include/asm/pgtable.h
  @@ -254,6 +254,45 @@ static inline pte_t
  *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }  #endif
  /* !CONFIG_HUGETLB_PAGE */
 
  +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
  +int writing, unsigned long
  +*pte_sizep)
 
 The name implies that it just reads the PTE.  Setting accessed/dirty shouldn't
 be an undocumented side-effect.  Why can't the caller do that (or a different
 function that the caller calls afterward if desired)?

Scott, I sent the next version of patch based on above idea. Now I think we do 
not need to update the pte flags on booke 
So we do not need to solve the kvmppc_read_update_linux_pte() stuff of book3s.

-Bharat

 
 Though even then you have the undocumented side effect of locking the PTE on
 certain targets.
 
  +{
  +   pte_t *ptep;
  +   pte_t pte;
  +   unsigned long ps = *pte_sizep;
  +   unsigned int shift;
  +
  +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
  +   if (!ptep)
  +   return __pte(0);
  +   if (shift)
  +   *pte_sizep = 1ul  shift;
  +   else
  +   *pte_sizep = PAGE_SIZE;
  +
  +   if (ps  *pte_sizep)
  +   return __pte(0);
  +
  +   if (!pte_present(*ptep))
  +   return __pte(0);
  +
  +#ifdef CONFIG_PPC64
  +   /* Lock PTE (set _PAGE_BUSY) and read */
  +   pte = pte_read(ptep);
  +#else
  +   pte = pte_val(*ptep);
  +#endif
 
 What about 32-bit platforms that need atomic PTEs?
 
 -Scott
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-06 Thread Paul Mackerras
On Tue, Aug 06, 2013 at 07:02:48AM +, Bhushan Bharat-R65777 wrote:
 
 I am trying to me the Linux pte search and update generic so that this can be 
 used for powerpc as well.
 
 I am not sure which of the below two should be ok, please help

Given that the BookE code uses gfn_to_pfn_memslot() to get the host
pfn, and then kvm_set_pfn_dirty(pfn) on pages that you're going to let
the guest write to, I don't think you need to set the dirty and/or
accessed bits in the Linux PTE explicitly.  If you care about the
WIMGE bits you can do find_linux_pte_or_hugepte() and just look at the
PTE, but you really should be using mmu_notifier_retry() to guard
against concurrent changes to the Linux PTE.  See the HV KVM code or
patch 21 of my recent series to see how it's used.  You probably
should be calling kvm_set_pfn_accessed() as well.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-06 Thread Scott Wood
On Wed, 2013-08-07 at 10:24 +1000, Paul Mackerras wrote:
 On Tue, Aug 06, 2013 at 07:02:48AM +, Bhushan Bharat-R65777 wrote:
  
  I am trying to me the Linux pte search and update generic so that this can 
  be used for powerpc as well.
  
  I am not sure which of the below two should be ok, please help
 
 Given that the BookE code uses gfn_to_pfn_memslot() to get the host
 pfn, and then kvm_set_pfn_dirty(pfn) on pages that you're going to let
 the guest write to, I don't think you need to set the dirty and/or
 accessed bits in the Linux PTE explicitly.  If you care about the
 WIMGE bits you can do find_linux_pte_or_hugepte() and just look at the
 PTE, but you really should be using mmu_notifier_retry() to guard
 against concurrent changes to the Linux PTE.  See the HV KVM code or
 patch 21 of my recent series to see how it's used. 

Hmm... we only get a callback on invalidate_range_start(), not
invalidate_range_end() (and even if we did get a callback for the
latter, it'd probably be racy).  So we may have a problem here
regardless of getting WIMG from the PTE, unless it's guaranteed that
hva_to_pfn() will fail after invalidate_range_start().

  You probably should be calling kvm_set_pfn_accessed() as well.

Yeah...  I think it'll only affect the quality of page-out decisions (as
opposed to corruption and such), but still it should be fixed.

-Scott



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-06 Thread Paul Mackerras
On Tue, Aug 06, 2013 at 08:11:34PM -0500, Scott Wood wrote:
 On Wed, 2013-08-07 at 10:24 +1000, Paul Mackerras wrote:
  On Tue, Aug 06, 2013 at 07:02:48AM +, Bhushan Bharat-R65777 wrote:
   
   I am trying to me the Linux pte search and update generic so that this 
   can be used for powerpc as well.
   
   I am not sure which of the below two should be ok, please help
  
  Given that the BookE code uses gfn_to_pfn_memslot() to get the host
  pfn, and then kvm_set_pfn_dirty(pfn) on pages that you're going to let
  the guest write to, I don't think you need to set the dirty and/or
  accessed bits in the Linux PTE explicitly.  If you care about the
  WIMGE bits you can do find_linux_pte_or_hugepte() and just look at the
  PTE, but you really should be using mmu_notifier_retry() to guard
  against concurrent changes to the Linux PTE.  See the HV KVM code or
  patch 21 of my recent series to see how it's used. 
 
 Hmm... we only get a callback on invalidate_range_start(), not
 invalidate_range_end() (and even if we did get a callback for the
 latter, it'd probably be racy).  So we may have a problem here
 regardless of getting WIMG from the PTE, unless it's guaranteed that
 hva_to_pfn() will fail after invalidate_range_start().

No, it's not guaranteed.  You have to use mmu_notifier_retry().  It
tells you if either (a) some sort of invalidation has happened since
you snapshotted kvm-mmu_notifier_seq, or (b) an
invalidate_range_start...end sequence is currently in progress.  In
either case you should discard any PTE or pfn information you
collected and retry.

   You probably should be calling kvm_set_pfn_accessed() as well.
 
 Yeah...  I think it'll only affect the quality of page-out decisions (as
 opposed to corruption and such), but still it should be fixed.

Right.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-05 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Saturday, August 03, 2013 9:54 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
  One of the problem I saw was that if I put this code in
  asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
  friend function (on which this code depends) are defined in pgtable.h.
  And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it
  defines pte_present() and friends functions.
 
  Ok I move wove this in asm/pgtable*.h, initially I fought with myself
  to take this code in pgtable* but finally end up doing here (got
  biased by book3s :)).
 
 Is there a reason why these routines can not be completely generic in 
 pgtable.h
 ?

How about the generic function:

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index d257d98..21daf28 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
return old;
 }

+static inline unsigned long pte_read(pte_t *p)
+{
+#ifdef PTE_ATOMIC_UPDATES
+   pte_t pte;
+   pte_t tmp;
+   __asm__ __volatile__ (
+   1: ldarx   %0,0,%3\n
+  andi.   %1,%0,%4\n
+  bne-1b\n
+  ori %1,%0,%4\n
+  stdcx.  %1,0,%3\n
+  bne-1b
+   : =r (pte), =r (tmp), =m (*p)
+   : r (p), i (_PAGE_BUSY)
+   : cc);
+
+   return pte;
+#else  
+   return pte_val(*p);
+#endif
+#endif
+}
 static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 690c8c2..dad712c 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t 
*pgdir, unsigned long ea,
 }
 #endif /* !CONFIG_HUGETLB_PAGE */

+static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+int writing, unsigned long *pte_sizep)
+{
+   pte_t *ptep;
+   pte_t pte;
+   unsigned long ps = *pte_sizep;
+   unsigned int shift;
+
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
+   if (!ptep)
+   return __pte(0);
+   if (shift)
+   *pte_sizep = 1ul  shift;
+   else
+   *pte_sizep = PAGE_SIZE;
+
+   if (ps  *pte_sizep)
+   return __pte(0);
+
+   if (!pte_present(*ptep))
+   return __pte(0);
+
+#ifdef CONFIG_PPC64
+   /* Lock PTE (set _PAGE_BUSY) and read */
+   pte = pte_read(ptep);
+#else
+   pte = pte_val(*ptep);
+#endif
+   if (pte_present(pte)) {
+   pte = pte_mkyoung(pte);
+   if (writing  pte_write(pte))
+   pte = pte_mkdirty(pte);
+   }
+
+   *ptep = __pte(pte); /* 64bit: Also unlock pte (clear _PAGE_BUSY) */
+
+   return pte;
+}
+
 #endif /* __ASSEMBLY__ */

 #endif /* __KERNEL__ */
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-05 Thread Scott Wood
On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
  Sent: Saturday, August 03, 2013 9:54 AM
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
  k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
  Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
  booke3s
  
  On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
   One of the problem I saw was that if I put this code in
   asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
   friend function (on which this code depends) are defined in pgtable.h.
   And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it
   defines pte_present() and friends functions.
  
   Ok I move wove this in asm/pgtable*.h, initially I fought with myself
   to take this code in pgtable* but finally end up doing here (got
   biased by book3s :)).
  
  Is there a reason why these routines can not be completely generic in 
  pgtable.h
  ?
 
 How about the generic function:
 
 diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
 b/arch/powerpc/include/asm/pgtable-ppc64.h
 index d257d98..21daf28 100644
 --- a/arch/powerpc/include/asm/pgtable-ppc64.h
 +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
 @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct 
 *mm,
 return old;
  }
 
 +static inline unsigned long pte_read(pte_t *p)
 +{
 +#ifdef PTE_ATOMIC_UPDATES
 +   pte_t pte;
 +   pte_t tmp;
 +   __asm__ __volatile__ (
 +   1: ldarx   %0,0,%3\n
 +  andi.   %1,%0,%4\n
 +  bne-1b\n
 +  ori %1,%0,%4\n
 +  stdcx.  %1,0,%3\n
 +  bne-1b
 +   : =r (pte), =r (tmp), =m (*p)
 +   : r (p), i (_PAGE_BUSY)
 +   : cc);
 +
 +   return pte;
 +#else  
 +   return pte_val(*p);
 +#endif
 +#endif
 +}
  static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)

Please leave a blank line between functions.

  {
 diff --git a/arch/powerpc/include/asm/pgtable.h 
 b/arch/powerpc/include/asm/pgtable.h
 index 690c8c2..dad712c 100644
 --- a/arch/powerpc/include/asm/pgtable.h
 +++ b/arch/powerpc/include/asm/pgtable.h
 @@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t 
 *pgdir, unsigned long ea,
  }
  #endif /* !CONFIG_HUGETLB_PAGE */
 
 +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
 +int writing, unsigned long *pte_sizep)

The name implies that it just reads the PTE.  Setting accessed/dirty
shouldn't be an undocumented side-effect.  Why can't the caller do that
(or a different function that the caller calls afterward if desired)?  

Though even then you have the undocumented side effect of locking the
PTE on certain targets.

 +{
 +   pte_t *ptep;
 +   pte_t pte;
 +   unsigned long ps = *pte_sizep;
 +   unsigned int shift;
 +
 +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
 +   if (!ptep)
 +   return __pte(0);
 +   if (shift)
 +   *pte_sizep = 1ul  shift;
 +   else
 +   *pte_sizep = PAGE_SIZE;
 +
 +   if (ps  *pte_sizep)
 +   return __pte(0);
 +
 +   if (!pte_present(*ptep))
 +   return __pte(0);
 +
 +#ifdef CONFIG_PPC64
 +   /* Lock PTE (set _PAGE_BUSY) and read */
 +   pte = pte_read(ptep);
 +#else
 +   pte = pte_val(*ptep);
 +#endif

What about 32-bit platforms that need atomic PTEs?

-Scott



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-05 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, August 06, 2013 12:49 AM
 To: Bhushan Bharat-R65777
 Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm-
 p...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
   Sent: Saturday, August 03, 2013 9:54 AM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
   k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
   Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
   lookup like booke3s
  
   On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
One of the problem I saw was that if I put this code in
asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
friend function (on which this code depends) are defined in pgtable.h.
And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
before it defines pte_present() and friends functions.
   
Ok I move wove this in asm/pgtable*.h, initially I fought with
myself to take this code in pgtable* but finally end up doing here
(got biased by book3s :)).
  
   Is there a reason why these routines can not be completely generic
   in pgtable.h ?
 
  How about the generic function:
 
  diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
  b/arch/powerpc/include/asm/pgtable-ppc64.h
  index d257d98..21daf28 100644
  --- a/arch/powerpc/include/asm/pgtable-ppc64.h
  +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
  @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct
 *mm,
  return old;
   }
 
  +static inline unsigned long pte_read(pte_t *p) { #ifdef
  +PTE_ATOMIC_UPDATES
  +   pte_t pte;
  +   pte_t tmp;
  +   __asm__ __volatile__ (
  +   1: ldarx   %0,0,%3\n
  +  andi.   %1,%0,%4\n
  +  bne-1b\n
  +  ori %1,%0,%4\n
  +  stdcx.  %1,0,%3\n
  +  bne-1b
  +   : =r (pte), =r (tmp), =m (*p)
  +   : r (p), i (_PAGE_BUSY)
  +   : cc);
  +
  +   return pte;
  +#else
  +   return pte_val(*p);
  +#endif
  +#endif
  +}
   static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
unsigned long addr,
  pte_t *ptep)
 
 Please leave a blank line between functions.
 
   {
  diff --git a/arch/powerpc/include/asm/pgtable.h
  b/arch/powerpc/include/asm/pgtable.h
  index 690c8c2..dad712c 100644
  --- a/arch/powerpc/include/asm/pgtable.h
  +++ b/arch/powerpc/include/asm/pgtable.h
  @@ -254,6 +254,45 @@ static inline pte_t
  *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }  #endif
  /* !CONFIG_HUGETLB_PAGE */
 
  +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
  +int writing, unsigned long
  +*pte_sizep)
 
 The name implies that it just reads the PTE.  Setting accessed/dirty shouldn't
 be an undocumented side-effect.

Ok, will rename and document.

 Why can't the caller do that (or a different
 function that the caller calls afterward if desired)?

The current implementation in book3s is;
 1) find a pte/hugepte
 2) return null if pte not present
 3) take _PAGE_BUSY lock
 4) set accessed/dirty
 5) clear _PAGE_BUSY.

What I tried was 
1) find a pte/hugepte
2) return null if pte not present
3) return pte (not take lock by not setting _PAGE_BUSY)

4) then user calls  __ptep_set_access_flags() to atomic update the 
dirty/accessed flags in pte.

- but the benchmark results were not good
- Also can there be race as we do not take lock in step 3 and update in step 4 ?
  
 
 Though even then you have the undocumented side effect of locking the PTE on
 certain targets.
 
  +{
  +   pte_t *ptep;
  +   pte_t pte;
  +   unsigned long ps = *pte_sizep;
  +   unsigned int shift;
  +
  +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
  +   if (!ptep)
  +   return __pte(0);
  +   if (shift)
  +   *pte_sizep = 1ul  shift;
  +   else
  +   *pte_sizep = PAGE_SIZE;
  +
  +   if (ps  *pte_sizep)
  +   return __pte(0);
  +
  +   if (!pte_present(*ptep))
  +   return __pte(0);
  +
  +#ifdef CONFIG_PPC64
  +   /* Lock PTE (set _PAGE_BUSY) and read */
  +   pte = pte_read(ptep);
  +#else
  +   pte = pte_val(*ptep);
  +#endif
 
 What about 32-bit platforms that need atomic PTEs?

I called __ptep_set_access_flags() for both 32/64bit (for 64bit I was not 
calling pte_read()), which handles atomic updates. Somehow the benchmark result 
were not good, will try again.

Thanks
-Bharat
 
 -Scott
 

___
Linuxppc-dev mailing list
Linuxppc-dev

Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-02 Thread “tiejun.chen”

On 08/01/2013 07:12 PM, Bharat Bhushan wrote:

KVM need to lookup linux pte for getting TLB attributes (WIMGE).
This is similar to how book3s does.
This will be used in follow-up patches.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v1-v2
  - This is a new change in this version

  arch/powerpc/include/asm/kvm_booke.h |   73 ++
  1 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index d3c1eb3..903624d 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -102,4 +102,77 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
  {
return vcpu-arch.shared-msr;
  }
+
+/*
+ * Lock and read a linux PTE.  If it's present and writable, atomically
+ * set dirty and referenced bits and return the PTE, otherwise return 0.
+ */
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
+{
+   pte_t pte;
+
+#ifdef PTE_ATOMIC_UPDATES
+   pte_t tmp;
+/* wait until _PAGE_BUSY is clear then set it atomically */
+#ifdef CONFIG_PPC64
+   __asm__ __volatile__ (
+   1:ldarx   %0,0,%3\n
+ andi.   %1,%0,%4\n
+ bne-1b\n
+ ori %1,%0,%4\n
+ stdcx.  %1,0,%3\n
+ bne-1b
+   : =r (pte), =r (tmp), =m (*p)
+   : r (p), i (_PAGE_BUSY)
+   : cc);
+#else
+__asm__ __volatile__ (
+1: lwarx   %0,0,%3\n
+   andi.   %1,%0,%4\n
+   bne-1b\n
+   ori %1,%0,%4\n
+   stwcx.  %1,0,%3\n
+   bne-1b
+: =r (pte), =r (tmp), =m (*p)
+: r (p), i (_PAGE_BUSY)
+: cc);
+#endif
+#else
+   pte = pte_val(*p);
+#endif
+
+   if (pte_present(pte)) {
+   pte = pte_mkyoung(pte);
+   if (writing  pte_write(pte))
+   pte = pte_mkdirty(pte);
+   }
+
+   *p = pte;   /* clears _PAGE_BUSY */
+
+   return pte;
+}
+
+static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+ int writing, unsigned long *pte_sizep)


Looks this function is as same as book3s, so why not improve that as common :)

Tiejun


+{
+   pte_t *ptep;
+   unsigned long ps = *pte_sizep;
+   unsigned int shift;
+
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
+   if (!ptep)
+   return __pte(0);
+   if (shift)
+   *pte_sizep = 1ul  shift;
+   else
+   *pte_sizep = PAGE_SIZE;
+
+   if (ps  *pte_sizep)
+   return __pte(0);
+   if (!pte_present(*ptep))
+   return __pte(0);
+
+   return kvmppc_read_update_linux_pte(ptep, writing);
+}
+
  #endif /* __ASM_KVM_BOOKE_H__ */



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-02 Thread Scott Wood
On Thu, 2013-08-01 at 16:42 +0530, Bharat Bhushan wrote:
 KVM need to lookup linux pte for getting TLB attributes (WIMGE).
 This is similar to how book3s does.
 This will be used in follow-up patches.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 v1-v2
  - This is a new change in this version
 
  arch/powerpc/include/asm/kvm_booke.h |   73 
 ++
  1 files changed, 73 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_booke.h 
 b/arch/powerpc/include/asm/kvm_booke.h
 index d3c1eb3..903624d 100644
 --- a/arch/powerpc/include/asm/kvm_booke.h
 +++ b/arch/powerpc/include/asm/kvm_booke.h
 @@ -102,4 +102,77 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
  {
   return vcpu-arch.shared-msr;
  }
 +
 +/*
 + * Lock and read a linux PTE.  If it's present and writable, atomically
 + * set dirty and referenced bits and return the PTE, otherwise return 0.
 + */
 +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
 +{
 + pte_t pte;
 +
 +#ifdef PTE_ATOMIC_UPDATES
 + pte_t tmp;
 +/* wait until _PAGE_BUSY is clear then set it atomically */

_PAGE_BUSY is 0 on book3e.

 +#ifdef CONFIG_PPC64
 + __asm__ __volatile__ (
 + 1: ldarx   %0,0,%3\n
 +andi.   %1,%0,%4\n
 +bne-1b\n
 +ori %1,%0,%4\n
 +stdcx.  %1,0,%3\n
 +bne-1b
 + : =r (pte), =r (tmp), =m (*p)
 + : r (p), i (_PAGE_BUSY)
 + : cc);
 +#else
 +__asm__ __volatile__ (
 +1: lwarx   %0,0,%3\n
 +   andi.   %1,%0,%4\n
 +   bne-1b\n
 +   ori %1,%0,%4\n
 +   stwcx.  %1,0,%3\n
 +   bne-1b
 +: =r (pte), =r (tmp), =m (*p)
 +: r (p), i (_PAGE_BUSY)
 +: cc);
 +#endif

What about 64-bit PTEs on 32-bit kernels?

In any case, this code does not belong in KVM.  It should be in the main
PPC mm code, even if KVM is the only user.

-Scott



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-02 Thread Benjamin Herrenschmidt
On Fri, 2013-08-02 at 17:58 -0500, Scott Wood wrote:
 
 What about 64-bit PTEs on 32-bit kernels?
 
 In any case, this code does not belong in KVM.  It should be in the
 main
 PPC mm code, even if KVM is the only user.

Also don't we do similar things in BookS KVM ? At the very least that
sutff should become common. And yes, I agree, it should probably also
move to pgtable*

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-02 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Saturday, August 03, 2013 4:47 AM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 On Fri, 2013-08-02 at 17:58 -0500, Scott Wood wrote:
 
  What about 64-bit PTEs on 32-bit kernels?
 
  In any case, this code does not belong in KVM.  It should be in the
  main PPC mm code, even if KVM is the only user.
 
 Also don't we do similar things in BookS KVM ? At the very least that sutff
 should become common. And yes, I agree, it should probably also move to 
 pgtable*

One of the problem I saw was that if I put this code in asm/pgtable-32.h and 
asm/pgtable-64.h then pte_persent() and other friend function (on which this 
code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h 
and asm/pgtable-64.h before it defines pte_present() and friends functions.

Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take 
this code in pgtable* but finally end up doing here (got biased by book3s :)).

Thanks
-Bharat

 
 Cheers,
 Ben.
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-02 Thread Benjamin Herrenschmidt
On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
 One of the problem I saw was that if I put this code in
 asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
 friend function (on which this code depends) are defined in pgtable.h.
 And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it
 defines pte_present() and friends functions.
 
 Ok I move wove this in asm/pgtable*.h, initially I fought with myself
 to take this code in pgtable* but finally end up doing here (got
 biased by book3s :)).

Is there a reason why these routines can not be completely generic
in pgtable.h ?

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev