Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-22 Thread Ram Pai
On Thu, Jun 22, 2017 at 02:37:27PM +0530, Anshuman Khandual wrote:
> On 06/17/2017 09:22 AM, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> > in the 4K backed hpte pages. These bits continue to be used
> > for 64K backed hpte pages in this patch, but will be freed
> > up in the next patch.
> > 
> > The patch does the following change to the 64K PTE format
> > 
> > H_PAGE_BUSY moves from bit 3 to bit 9
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> > valid slot we discard  and consider it as an invalid
> > slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> > 
> > When  we  release  ahpte   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> > 
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> > 
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> > 
> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> > Ellermen and myself.
> > 
> > 4K PTE format remain unchanged currently.
> 
> Scanned through the PTE format again for hash 64K and 4K. It seems
> to me that there might be 5 free bits already present on the PTE
> format. I might have seriously mistaken something here :) Please
> correct me if that is not the case. _RPAGE_RPN* I think is applicable
> only for hash page table format and will not be available for radix
> later.
> 
> +#define _PAGE_FREE_1   0x0040UL /* Not used */
> +#define _RPAGE_SW0 0x2000UL /* Not used */
> +#define _RPAGE_SW1 0x0800UL /* Not used */
> +#define _RPAGE_RPN42   0x0040UL /* Not used */
> +#define _RPAGE_RPN41   0x0020UL /* Not used */
> 

The bits are chosen to future proof for radix implementation.
_RPAGE_SW* will eat into what is available for software in the future,
and these key-bits will certainly be something that the radix
hardware will read, in the future.

The _RPAGE_RPN* bits cannot be relied on for radix.

But finally the bits that we chose (H_PAGE_F_SECOND|H_PAGE_F_GIX) had
the best potential for giving us the highest number of free bits with
relatively less effort.

RP



Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-22 Thread Anshuman Khandual
On 06/17/2017 09:22 AM, Ram Pai wrote:
> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> in the 4K backed hpte pages. These bits continue to be used
> for 64K backed hpte pages in this patch, but will be freed
> up in the next patch.
> 
> The patch does the following change to the 64K PTE format
> 
> H_PAGE_BUSY moves from bit 3 to bit 9
> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
>   of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
>   second part of the pte.
> 
> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> released immediately. In  other  words, even  though   0xF   is   a
> valid slot we discard  and consider it as an invalid
> slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> depend on a bit in the primary PTE in order to determine the
> validity of a slot.
> 
> When  we  release  ahpte   in the 0xF   slot we also   release a
> legitimate primary   slot  andunmapthat  entry. This  is  to
> ensure  that we do get a   legimate   non-0xF  slot the next time we
> retry for a slot.
> 
> Though treating 0xF slot as invalid reduces the number of available
> slots  and  may  have an effect  on the performance, the probabilty
> of hitting a 0xF is extermely low.
> 
> Compared  to the current scheme, the above described scheme reduces
> the number of false hash table updates  significantly  and  has the
> added  advantage  of  releasing  four  valuable  PTE bits for other
> purpose.
> 
> This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> Ellermen and myself.
> 
> 4K PTE format remain unchanged currently.

Scanned through the PTE format again for hash 64K and 4K. It seems
to me that there might be 5 free bits already present on the PTE
format. I might have seriously mistaken something here :) Please
correct me if that is not the case. _RPAGE_RPN* I think is applicable
only for hash page table format and will not be available for radix
later.

+#define _PAGE_FREE_1   0x0040UL /* Not used */
+#define _RPAGE_SW0 0x2000UL /* Not used */
+#define _RPAGE_SW1 0x0800UL /* Not used */
+#define _RPAGE_RPN42   0x0040UL /* Not used */
+#define _RPAGE_RPN41   0x0020UL /* Not used */




Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-21 Thread Ram Pai
On Wed, Jun 21, 2017 at 12:11:32PM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> > in the 4K backed hpte pages. These bits continue to be used
> > for 64K backed hpte pages in this patch, but will be freed
> > up in the next patch.
> >
> > The patch does the following change to the 64K PTE format
> >
> > H_PAGE_BUSY moves from bit 3 to bit 9
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> >
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> > valid slot we discard  and consider it as an invalid
> > slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> >
> > When  we  release  ahpte   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> >
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> >
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> >
> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> > Ellermen and myself.
> >
> > 4K PTE format remain unchanged currently.
> >
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash-4k.h  | 20 +++
> >  arch/powerpc/include/asm/book3s/64/hash-64k.h | 32 +++
> >  arch/powerpc/include/asm/book3s/64/hash.h | 15 +++--
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  5 ++
> >  arch/powerpc/mm/dump_linuxpagetables.c|  3 +-
> >  arch/powerpc/mm/hash64_4k.c   | 14 ++---
> >  arch/powerpc/mm/hash64_64k.c  | 81 
> > ---
> >  arch/powerpc/mm/hash_utils_64.c   | 30 +++---
> >  8 files changed, 122 insertions(+), 78 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > index b4b5e6b..5ef1d81 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > @@ -16,6 +16,18 @@
> >  #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
> >  #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> >
> > +
> > +/*
> > + * Only supported by 4k linux page size
> > + */
> > +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG 
> > */
> > +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> > +#define H_PAGE_F_GIX_SHIFT 56
> > +
> > +#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are 
> > busy */
> > +#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
> > +
> > +
> >  /* PTE flags to conserve for HPTE identification */
> >  #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
> >  H_PAGE_F_SECOND | H_PAGE_F_GIX)
> > @@ -48,6 +60,14 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
> >  }
> >  #endif
> >
> > +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> > +   unsigned int subpg_index, unsigned long slot)
> > +{
> > +   return (slot << H_PAGE_F_GIX_SHIFT) &
> > +   (H_PAGE_F_SECOND | H_PAGE_F_GIX);
> > +}
> > +
> > +
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >
> >  static inline char *get_hpte_slot_array(pmd_t *pmdp)
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> > index 9732837..0eb3c89 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> > @@ -10,23 +10,25 @@
> >   * 64k aligned address free up few of the lower bits of RPN for us
> >   * We steal that here. For more deatils look at pte_pfn/pfn_pte()
> >   */
> > -#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
> > -#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
> > +#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
> > +#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
> > +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> > +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> > +#define H_PAGE_F_GIX_SHIFT 56

Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-20 Thread Aneesh Kumar K.V
Ram Pai  writes:

> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> in the 4K backed hpte pages. These bits continue to be used
> for 64K backed hpte pages in this patch, but will be freed
> up in the next patch.
>
> The patch does the following change to the 64K PTE format
>
> H_PAGE_BUSY moves from bit 3 to bit 9
> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
>   of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
>   second part of the pte.
>
> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> released immediately. In  other  words, even  though   0xF   is   a
> valid slot we discard  and consider it as an invalid
> slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> depend on a bit in the primary PTE in order to determine the
> validity of a slot.
>
> When  we  release  ahpte   in the 0xF   slot we also   release a
> legitimate primary   slot  andunmapthat  entry. This  is  to
> ensure  that we do get a   legimate   non-0xF  slot the next time we
> retry for a slot.
>
> Though treating 0xF slot as invalid reduces the number of available
> slots  and  may  have an effect  on the performance, the probabilty
> of hitting a 0xF is extermely low.
>
> Compared  to the current scheme, the above described scheme reduces
> the number of false hash table updates  significantly  and  has the
> added  advantage  of  releasing  four  valuable  PTE bits for other
> purpose.
>
> This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> Ellermen and myself.
>
> 4K PTE format remain unchanged currently.
>
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  | 20 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 32 +++
>  arch/powerpc/include/asm/book3s/64/hash.h | 15 +++--
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  5 ++
>  arch/powerpc/mm/dump_linuxpagetables.c|  3 +-
>  arch/powerpc/mm/hash64_4k.c   | 14 ++---
>  arch/powerpc/mm/hash64_64k.c  | 81 
> ---
>  arch/powerpc/mm/hash_utils_64.c   | 30 +++---
>  8 files changed, 122 insertions(+), 78 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index b4b5e6b..5ef1d81 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,18 @@
>  #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
>
> +
> +/*
> + * Only supported by 4k linux page size
> + */
> +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT 56
> +
> +#define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated HPTE */
> +
> +
>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
>H_PAGE_F_SECOND | H_PAGE_F_GIX)
> @@ -48,6 +60,14 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
>  }
>  #endif
>
> +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> + unsigned int subpg_index, unsigned long slot)
> +{
> + return (slot << H_PAGE_F_GIX_SHIFT) &
> + (H_PAGE_F_SECOND | H_PAGE_F_GIX);
> +}
> +
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>
>  static inline char *get_hpte_slot_array(pmd_t *pmdp)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 9732837..0eb3c89 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -10,23 +10,25 @@
>   * 64k aligned address free up few of the lower bits of RPN for us
>   * We steal that here. For more deatils look at pte_pfn/pfn_pte()
>   */
> -#define H_PAGE_COMBO _RPAGE_RPN0 /* this is a combo 4k page */
> -#define H_PAGE_4K_PFN_RPAGE_RPN1 /* PFN is for a single 4k page */
> +#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
> +#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
> +#define H_PAGE_F_SECOND  _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT   56
> +
> +
> +#define H_PAGE_BUSY  _RPAGE_RPN42 /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated HPTE */
> +
>  /*
>   * We need to differentiate between explicit huge page and THP huge
>   * page, since THP huge page also need to tra

Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-20 Thread Ram Pai
On Wed, Jun 21, 2017 at 11:05:33AM +0530, Anshuman Khandual wrote:
> On 06/21/2017 04:53 AM, Ram Pai wrote:
> > On Tue, Jun 20, 2017 at 03:50:25PM +0530, Anshuman Khandual wrote:
> >> On 06/17/2017 09:22 AM, Ram Pai wrote:
> >>> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> >>> in the 4K backed hpte pages. These bits continue to be used
> >>> for 64K backed hpte pages in this patch, but will be freed
> >>> up in the next patch.
> >>
> >> The counting 3, 4, 5 and 6 are in BE format I believe, I was
> >> initially trying to see that from right to left as we normally
> >> do in the kernel and was getting confused. So basically these
> >> bits (which are only applicable for 64K mapping IIUC) are going
> >> to be freed up from the PTE format.
> >>
> >> #define _RPAGE_RSV10x1000UL
> >> #define _RPAGE_RSV20x0800UL
> >> #define _RPAGE_RSV30x0400UL
> >> #define _RPAGE_RSV40x0200UL
> >>
> >> As you have mentioned before this feature is available for 64K
> >> page size only and not for 4K mappings. So I assume we support
> >> both the combinations.
> >>
> >> * 64K mapping on 64K
> >> * 64K mapping on 4K
> > 
> > yes.
> > 
> >>
> >> These are the current users of the above bits
> >>
> >> #define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are 
> >> busy */
> >> #define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary 
> >> HPTEG */
> >> #define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | 
> >> _RPAGE_RPN44)
> >> #define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated 
> >> HPTE */
> >>
> >>>
> >>> The patch does the following change to the 64K PTE format
> >>>
> >>> H_PAGE_BUSY moves from bit 3 to bit 9
> >>
> >> and what is in there on bit 9 now ? This ?
> >>
> >> #define _RPAGE_SW2 0x00400
> >>
> >> which is used as 
> >>
> >> #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
> >>
> >> which will not be required any more ?
> > 
> > i think you are reading bit 9 from right to left. the bit 9 i refer to
> > is from left to right. Using the same numbering convention the ISA3.0 uses.
> 
> Right, my bad. Then it would be this one.
> 
> '#define _RPAGE_RPN42 0x0040UL'
> 
> > I know it is confusing, will make a mention in the comment of this
> > patch, to read it the big-endian way.
> 
> Right.
> 
> > 
> > BTW: Bit 9 is not used currently. so using it in this patch. But this is
> > a temporary move. the H_PAGE_BUSY will move to bit 7 in the next patch.
> > 
> > Had to keep at bit 9, because bit 7 is not yet entirely freed up. it is
> > used by 64K PTE backed by 64k htpe.
> 
> Got it.
> 
> > 
> >>
> >>> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> >>>   of the pte.
> >>> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> >>>   second part of the pte.
> >>>
> >>> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> >>> is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> >>> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> >>> released immediately. In  other  words, even  though   0xF   is   a
> >>
> >> Release immediately means we attempt again for a new hash slot ?
> > 
> > yes.
> > 
> >>
> >>> valid slot we discard  and consider it as an invalid
> >>> slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> >>> depend on a bit in the primary PTE in order to determine the
> >>> validity of a slot.
> >>
> >> So we have to see the slot number in the second half for each PTE to
> >> figure out if it has got a valid slot in the hash page table.
> > 
> > yes.
> > 
> >>
> >>>
> >>> When  we  release  ahpte   in the 0xF   slot we also   release a
> >>> legitimate primary   slot  andunmapthat  entry. This  is  to
> >>> ensure  that we do get a   legimate   non-0xF  slot the next time we
> >>> retry for a slot.
> >>
> >> Okay.
> >>
> >>>
> >>> Though treating 0xF slot as invalid reduces the number of available
> >>> slots  and  may  have an effect  on the performance, the probabilty
> >>> of hitting a 0xF is extermely low.
> >>
> >> Why you say that ? I thought every slot number has the same probability
> >> of hit from the hash function.
> > 
> > Every hash bucket has the same probability. But every slot within the
> > hash bucket is filled in sequentially. so it takes 15 hptes to hash to
> > the same bucket before we get to the 15th slot in the secondary.
> 
> Okay, would the last one be 16th instead ?
> 
> > 
> >>
> >>>
> >>> Compared  to the current scheme, the above described scheme reduces
> >>> the number of false hash table updates  significantly  and  has the
> >>
> >> How it reduces false hash table updates ?
> > 
> > earlier, we had 1 bit allocated in the first-part-of-the 64K-PTE 
> > for four consecutive 4K hptes. If any one 4k hpte got hashed-in,
> > the bit got set. Wh

Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-20 Thread Anshuman Khandual
On 06/21/2017 04:53 AM, Ram Pai wrote:
> On Tue, Jun 20, 2017 at 03:50:25PM +0530, Anshuman Khandual wrote:
>> On 06/17/2017 09:22 AM, Ram Pai wrote:
>>> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
>>> in the 4K backed hpte pages. These bits continue to be used
>>> for 64K backed hpte pages in this patch, but will be freed
>>> up in the next patch.
>>
>> The counting 3, 4, 5 and 6 are in BE format I believe, I was
>> initially trying to see that from right to left as we normally
>> do in the kernel and was getting confused. So basically these
>> bits (which are only applicable for 64K mapping IIUC) are going
>> to be freed up from the PTE format.
>>
>> #define _RPAGE_RSV1  0x1000UL
>> #define _RPAGE_RSV2  0x0800UL
>> #define _RPAGE_RSV3  0x0400UL
>> #define _RPAGE_RSV4  0x0200UL
>>
>> As you have mentioned before this feature is available for 64K
>> page size only and not for 4K mappings. So I assume we support
>> both the combinations.
>>
>> * 64K mapping on 64K
>> * 64K mapping on 4K
> 
> yes.
> 
>>
>> These are the current users of the above bits
>>
>> #define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
>> #define H_PAGE_F_SECOND  _RPAGE_RSV2 /* HPTE is in 2ndary 
>> HPTEG */
>> #define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
>> #define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated 
>> HPTE */
>>
>>>
>>> The patch does the following change to the 64K PTE format
>>>
>>> H_PAGE_BUSY moves from bit 3 to bit 9
>>
>> and what is in there on bit 9 now ? This ?
>>
>> #define _RPAGE_SW2   0x00400
>>
>> which is used as 
>>
>> #define _PAGE_SPECIAL_RPAGE_SW2 /* software: special page */
>>
>> which will not be required any more ?
> 
> i think you are reading bit 9 from right to left. the bit 9 i refer to
> is from left to right. Using the same numbering convention the ISA3.0 uses.

Right, my bad. Then it would be this one.

'#define _RPAGE_RPN42   0x0040UL'

> I know it is confusing, will make a mention in the comment of this
> patch, to read it the big-endian way.

Right.

> 
> BTW: Bit 9 is not used currently. so using it in this patch. But this is
> a temporary move. the H_PAGE_BUSY will move to bit 7 in the next patch.
> 
> Had to keep at bit 9, because bit 7 is not yet entirely freed up. it is
> used by 64K PTE backed by 64k htpe.

Got it.

> 
>>
>>> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
>>> of the pte.
>>> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
>>> second part of the pte.
>>>
>>> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
>>> is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
>>> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
>>> released immediately. In  other  words, even  though   0xF   is   a
>>
>> Release immediately means we attempt again for a new hash slot ?
> 
> yes.
> 
>>
>>> valid slot we discard  and consider it as an invalid
>>> slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
>>> depend on a bit in the primary PTE in order to determine the
>>> validity of a slot.
>>
>> So we have to see the slot number in the second half for each PTE to
>> figure out if it has got a valid slot in the hash page table.
> 
> yes.
> 
>>
>>>
>>> When  we  release  ahpte   in the 0xF   slot we also   release a
>>> legitimate primary   slot  andunmapthat  entry. This  is  to
>>> ensure  that we do get a   legimate   non-0xF  slot the next time we
>>> retry for a slot.
>>
>> Okay.
>>
>>>
>>> Though treating 0xF slot as invalid reduces the number of available
>>> slots  and  may  have an effect  on the performance, the probabilty
>>> of hitting a 0xF is extermely low.
>>
>> Why you say that ? I thought every slot number has the same probability
>> of hit from the hash function.
> 
> Every hash bucket has the same probability. But every slot within the
> hash bucket is filled in sequentially. so it takes 15 hptes to hash to
> the same bucket before we get to the 15th slot in the secondary.

Okay, would the last one be 16th instead ?

> 
>>
>>>
>>> Compared  to the current scheme, the above described scheme reduces
>>> the number of false hash table updates  significantly  and  has the
>>
>> How it reduces false hash table updates ?
> 
> earlier, we had 1 bit allocated in the first-part-of-the 64K-PTE 
> for four consecutive 4K hptes. If any one 4k hpte got hashed-in,
> the bit got set. Which means anytime it faulted on the remaining
> three 4k hpte, we saw the bit already set and tried to erroneously 
> update that hpte. So we had a 75% update error rate. Funcationally
> not bad, but bad from a performance point of view.

I am bit out of sync regarding these PTE bits, after Aneesh's radix
changes went in :) Will look into this bit closer.

> 
> With th

Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-20 Thread Ram Pai
On Tue, Jun 20, 2017 at 03:50:25PM +0530, Anshuman Khandual wrote:
> On 06/17/2017 09:22 AM, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> > in the 4K backed hpte pages. These bits continue to be used
> > for 64K backed hpte pages in this patch, but will be freed
> > up in the next patch.
> 
> The counting 3, 4, 5 and 6 are in BE format I believe, I was
> initially trying to see that from right to left as we normally
> do in the kernel and was getting confused. So basically these
> bits (which are only applicable for 64K mapping IIUC) are going
> to be freed up from the PTE format.
> 
> #define _RPAGE_RSV1   0x1000UL
> #define _RPAGE_RSV2   0x0800UL
> #define _RPAGE_RSV3   0x0400UL
> #define _RPAGE_RSV4   0x0200UL
> 
> As you have mentioned before this feature is available for 64K
> page size only and not for 4K mappings. So I assume we support
> both the combinations.
> 
> * 64K mapping on 64K
> * 64K mapping on 4K

yes.

> 
> These are the current users of the above bits
> 
> #define H_PAGE_BUSY   _RPAGE_RSV1 /* software: PTE & hash are busy */
> #define H_PAGE_F_SECOND   _RPAGE_RSV2 /* HPTE is in 2ndary 
> HPTEG */
> #define H_PAGE_F_GIX  (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> #define H_PAGE_HASHPTE_RPAGE_RPN43/* PTE has associated 
> HPTE */
> 
> > 
> > The patch does the following change to the 64K PTE format
> > 
> > H_PAGE_BUSY moves from bit 3 to bit 9
> 
> and what is in there on bit 9 now ? This ?
> 
> #define _RPAGE_SW20x00400
> 
> which is used as 
> 
> #define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */
> 
> which will not be required any more ?

i think you are reading bit 9 from right to left. the bit 9 i refer to
is from left to right. Using the same numbering convention the ISA3.0 uses.
I know it is confusing, will make a mention in the comment of this
patch, to read it the big-endian way.

BTW: Bit 9 is not used currently. so using it in this patch. But this is
a temporary move. the H_PAGE_BUSY will move to bit 7 in the next patch.

Had to keep at bit 9, because bit 7 is not yet entirely freed up. it is
used by 64K PTE backed by 64k htpe.

> 
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> 
> Release immediately means we attempt again for a new hash slot ?

yes.

> 
> > valid slot we discard  and consider it as an invalid
> > slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> 
> So we have to see the slot number in the second half for each PTE to
> figure out if it has got a valid slot in the hash page table.

yes.

> 
> > 
> > When  we  release  ahpte   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> 
> Okay.
> 
> > 
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> 
> Why you say that ? I thought every slot number has the same probability
> of hit from the hash function.

Every hash bucket has the same probability. But every slot within the
hash bucket is filled in sequentially. so it takes 15 hptes to hash to
the same bucket before we get to the 15th slot in the secondary.

> 
> > 
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> 
> How it reduces false hash table updates ?

earlier, we had 1 bit allocated in the first-part-of-the 64K-PTE 
for four consecutive 4K hptes. If any one 4k hpte got hashed-in,
the bit got set. Which means anytime it faulted on the remaining
three 4k hpte, we saw the bit already set and tried to erroneously 
update that hpte. So we had a 75% update error rate. Funcationally
not bad, but bad from a performance point of view.

With the current scheme, we decide if a 4k slot is valid by looking
at its value rather than depending on a bit in the main-pte. So
there is no chance of getting mislead. And hence no chance of trying
to update a invalid hpte. Should improve performance and at the same
time give us four valuable PTE bits.


> 
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> > 
> > This idea was jointly 

Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-20 Thread Anshuman Khandual
On 06/17/2017 09:22 AM, Ram Pai wrote:
> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> in the 4K backed hpte pages. These bits continue to be used
> for 64K backed hpte pages in this patch, but will be freed
> up in the next patch.

The counting 3, 4, 5 and 6 are in BE format I believe, I was
initially trying to see that from right to left as we normally
do in the kernel and was getting confused. So basically these
bits (which are only applicable for 64K mapping IIUC) are going
to be freed up from the PTE format.

#define _RPAGE_RSV1 0x1000UL
#define _RPAGE_RSV2 0x0800UL
#define _RPAGE_RSV3 0x0400UL
#define _RPAGE_RSV4 0x0200UL

As you have mentioned before this feature is available for 64K
page size only and not for 4K mappings. So I assume we support
both the combinations.

* 64K mapping on 64K
* 64K mapping on 4K

These are the current users of the above bits

#define H_PAGE_BUSY _RPAGE_RSV1 /* software: PTE & hash are busy */
#define H_PAGE_F_SECOND _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
#define H_PAGE_F_GIX(_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
#define H_PAGE_HASHPTE  _RPAGE_RPN43/* PTE has associated HPTE */

> 
> The patch does the following change to the 64K PTE format
> 
> H_PAGE_BUSY moves from bit 3 to bit 9

and what is in there on bit 9 now ? This ?

#define _RPAGE_SW2  0x00400

which is used as 

#define _PAGE_SPECIAL   _RPAGE_SW2 /* software: special page */

which will not be required any more ?

> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
>   of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
>   second part of the pte.
> 
> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> released immediately. In  other  words, even  though   0xF   is   a

Release immediately means we attempt again for a new hash slot ?

> valid slot we discard  and consider it as an invalid
> slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> depend on a bit in the primary PTE in order to determine the
> validity of a slot.

So we have to see the slot number in the second half for each PTE to
figure out if it has got a valid slot in the hash page table.

> 
> When  we  release  ahpte   in the 0xF   slot we also   release a
> legitimate primary   slot  andunmapthat  entry. This  is  to
> ensure  that we do get a   legimate   non-0xF  slot the next time we
> retry for a slot.

Okay.

> 
> Though treating 0xF slot as invalid reduces the number of available
> slots  and  may  have an effect  on the performance, the probabilty
> of hitting a 0xF is extermely low.

Why you say that ? I thought every slot number has the same probability
of hit from the hash function.

> 
> Compared  to the current scheme, the above described scheme reduces
> the number of false hash table updates  significantly  and  has the

How it reduces false hash table updates ?

> added  advantage  of  releasing  four  valuable  PTE bits for other
> purpose.
> 
> This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> Ellermen and myself.
> 
> 4K PTE format remain unchanged currently.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  | 20 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 32 +++
>  arch/powerpc/include/asm/book3s/64/hash.h | 15 +++--
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  5 ++
>  arch/powerpc/mm/dump_linuxpagetables.c|  3 +-
>  arch/powerpc/mm/hash64_4k.c   | 14 ++---
>  arch/powerpc/mm/hash64_64k.c  | 81 
> ---
>  arch/powerpc/mm/hash_utils_64.c   | 30 +++---
>  8 files changed, 122 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index b4b5e6b..5ef1d81 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,18 @@
>  #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> 
> +
> +/*
> + * Only supported by 4k linux page size
> + */
> +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT 56
> +
> +#define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated HPTE */
> +
> +

So we moved the common 64K definitions here.


>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H