Re: [RFC v3 01/23] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-06-22 Thread Ram Pai
On Thu, Jun 22, 2017 at 07:21:03PM +1000, Balbir Singh wrote:
> On Wed, 2017-06-21 at 18:39 -0700, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> > in the 4K backed HPTE pages. These bits continue to be used
> > for 64K backed HPTE pages in this patch,  but will be freed
> > up in the next patch. The  bit  numbers  are big-endian  as
> > defined in the ISA3.0
> > 
> > The patch does the following change to the 64K PTE format
> >
> 
> Why can't we stuff the bits in the VMA and retrieve it from there?
> Basically always get a minor fault in hash and for keys handle
> the fault in do_page_fault() and handle the keys from the VMA?

I think you raise a valid point. We dont necessarily have to program
the pte. the hpte can be programmed directly from the key in the vma.
Just that the code becomes a little ugly to do so, since the
_hash_page_*() functions do not have access to the vma.

However we are also trying to maintain consistency between hpte and rpte
implementation. The keys have to be programmed into the rpte.
The patch is working towards enabling the consistency, so that
the same code can work on both, hpte for now and rpte in the future.

Maybe I can just do what you propose.  However this patch by itself
has value, because it frees up four valuable pte bits, irrespective
of whether we use it for memory keys. Let me see what others have
to say.  

Aneesh: thoughts?

> 
> > H_PAGE_BUSY moves from bit 3 to bit 9
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a HPTE
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> > valid slot we discard  and consider it as an invalid
> > slot;i.e HPTE(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> 
> This is not clear, could you please rephrase? What is the bit in the
> primary key we rely on?

(H_PAGE_F_SECOND|H_PAGE_F_GIX) bits, which is big-endian bits 3 4 5 and
6. They are currently used to track the validitiy of the 4k-hptes backing the
64k-pte.   Each bit tracks four 4k-hptes, for a total of sixteen
4k-hptes.


> 
> > 
> > When  we  release  aHPTE   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> > 
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> > 
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> > 
> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> > Ellermen and myself.
> >
> 
> It would be helpful if you had a text diagram explaining the PTE bits
> before and after.

ok. will add it in the next version.

> 
> > 4K PTE format remain unchanged currently.
> >
> 
> The code seems to be doing a lot more than the changelog suggests. A few
> functions are completely removed, common code between 64K and 4K has been
> split under #ifndef. It would be good to call all of these out.

ok. will do.

> 
> > Signed-off-by: Ram Pai 
> > 
> > Conflicts:
> > arch/powerpc/include/asm/book3s/64/hash.h
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
> >  arch/powerpc/include/asm/book3s/64/hash-64k.h | 17 ---
> >  arch/powerpc/include/asm/book3s/64/hash.h | 12 +++--
> >  arch/powerpc/mm/hash64_64k.c  | 70 
> > +++
> >  arch/powerpc/mm/hash_utils_64.c   |  4 +-
> >  5 files changed, 66 insertions(+), 44 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > index b4b5e6b..9c2c8f1 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > @@ -16,6 +16,13 @@
> >  #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
> >  #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> >  
> > +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG 
> > */
> > +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> > +#define H_PAGE_F_GIX_SHIFT 56
> > +
> > +#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are 
> > busy */
> > +#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated 

Re: [RFC v3 01/23] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-06-22 Thread Ram Pai
On Thu, Jun 22, 2017 at 07:21:03PM +1000, Balbir Singh wrote:
> On Wed, 2017-06-21 at 18:39 -0700, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> > in the 4K backed HPTE pages. These bits continue to be used
> > for 64K backed HPTE pages in this patch,  but will be freed
> > up in the next patch. The  bit  numbers  are big-endian  as
> > defined in the ISA3.0
> > 
> > The patch does the following change to the 64K PTE format
> >
> 
> Why can't we stuff the bits in the VMA and retrieve it from there?
> Basically always get a minor fault in hash and for keys handle
> the fault in do_page_fault() and handle the keys from the VMA?

I think you raise a valid point. We dont necessarily have to program
the pte. the hpte can be programmed directly from the key in the vma.
Just that the code becomes a little ugly to do so, since the
_hash_page_*() functions do not have access to the vma.

However we are also trying to maintain consistency between hpte and rpte
implementation. The keys have to be programmed into the rpte.
The patch is working towards enabling the consistency, so that
the same code can work on both, hpte for now and rpte in the future.

Maybe I can just do what you propose.  However this patch by itself
has value, because it frees up four valuable pte bits, irrespective
of whether we use it for memory keys. Let me see what others have
to say.  

Aneesh: thoughts?

> 
> > H_PAGE_BUSY moves from bit 3 to bit 9
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a HPTE
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> > valid slot we discard  and consider it as an invalid
> > slot;i.e HPTE(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> 
> This is not clear, could you please rephrase? What is the bit in the
> primary key we rely on?

(H_PAGE_F_SECOND|H_PAGE_F_GIX) bits, which is big-endian bits 3 4 5 and
6. They are currently used to track the validitiy of the 4k-hptes backing the
64k-pte.   Each bit tracks four 4k-hptes, for a total of sixteen
4k-hptes.


> 
> > 
> > When  we  release  aHPTE   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> > 
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> > 
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> > 
> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> > Ellermen and myself.
> >
> 
> It would be helpful if you had a text diagram explaining the PTE bits
> before and after.

ok. will add it in the next version.

> 
> > 4K PTE format remain unchanged currently.
> >
> 
> The code seems to be doing a lot more than the changelog suggests. A few
> functions are completely removed, common code between 64K and 4K has been
> split under #ifndef. It would be good to call all of these out.

ok. will do.

> 
> > Signed-off-by: Ram Pai 
> > 
> > Conflicts:
> > arch/powerpc/include/asm/book3s/64/hash.h
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
> >  arch/powerpc/include/asm/book3s/64/hash-64k.h | 17 ---
> >  arch/powerpc/include/asm/book3s/64/hash.h | 12 +++--
> >  arch/powerpc/mm/hash64_64k.c  | 70 
> > +++
> >  arch/powerpc/mm/hash_utils_64.c   |  4 +-
> >  5 files changed, 66 insertions(+), 44 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > index b4b5e6b..9c2c8f1 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > @@ -16,6 +16,13 @@
> >  #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
> >  #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> >  
> > +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG 
> > */
> > +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> > +#define H_PAGE_F_GIX_SHIFT 56
> > +
> > +#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are 
> > busy */
> > +#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
> > +
> >  /* 

Re: [RFC v3 01/23] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-06-22 Thread Balbir Singh
On Wed, 2017-06-21 at 18:39 -0700, Ram Pai wrote:
> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> in the 4K backed HPTE pages. These bits continue to be used
> for 64K backed HPTE pages in this patch,  but will be freed
> up in the next patch. The  bit  numbers  are big-endian  as
> defined in the ISA3.0
> 
> The patch does the following change to the 64K PTE format
>

Why can't we stuff the bits in the VMA and retrieve it from there?
Basically always get a minor fault in hash and for keys handle
the fault in do_page_fault() and handle the keys from the VMA?
 
> H_PAGE_BUSY moves from bit 3 to bit 9
> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
>   of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
>   second part of the pte.
> 
> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> is  initialized  to  0xF  indicating  an invalid  slot.  If  a HPTE
> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> released immediately. In  other  words, even  though   0xF   is   a
> valid slot we discard  and consider it as an invalid
> slot;i.e HPTE(). This  gives  us  an opportunity to not
> depend on a bit in the primary PTE in order to determine the
> validity of a slot.

This is not clear, could you please rephrase? What is the bit in the
primary key we rely on?

> 
> When  we  release  aHPTE   in the 0xF   slot we also   release a
> legitimate primary   slot  andunmapthat  entry. This  is  to
> ensure  that we do get a   legimate   non-0xF  slot the next time we
> retry for a slot.
> 
> Though treating 0xF slot as invalid reduces the number of available
> slots  and  may  have an effect  on the performance, the probabilty
> of hitting a 0xF is extermely low.
> 
> Compared  to the current scheme, the above described scheme reduces
> the number of false hash table updates  significantly  and  has the
> added  advantage  of  releasing  four  valuable  PTE bits for other
> purpose.
> 
> This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> Ellermen and myself.
>

It would be helpful if you had a text diagram explaining the PTE bits
before and after.
 
> 4K PTE format remain unchanged currently.
>

The code seems to be doing a lot more than the changelog suggests. A few
functions are completely removed, common code between 64K and 4K has been
split under #ifndef. It would be good to call all of these out.
 
> Signed-off-by: Ram Pai 
> 
> Conflicts:
>   arch/powerpc/include/asm/book3s/64/hash.h
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 17 ---
>  arch/powerpc/include/asm/book3s/64/hash.h | 12 +++--
>  arch/powerpc/mm/hash64_64k.c  | 70 
> +++
>  arch/powerpc/mm/hash_utils_64.c   |  4 +-
>  5 files changed, 66 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index b4b5e6b..9c2c8f1 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,13 @@
>  #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
>  
> +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT 56
> +
> +#define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated HPTE */
> +
>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
>H_PAGE_F_SECOND | H_PAGE_F_GIX)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 9732837..3f49941 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -10,20 +10,21 @@
>   * 64k aligned address free up few of the lower bits of RPN for us
>   * We steal that here. For more deatils look at pte_pfn/pfn_pte()
>   */
> -#define H_PAGE_COMBO _RPAGE_RPN0 /* this is a combo 4k page */
> -#define H_PAGE_4K_PFN_RPAGE_RPN1 /* PFN is for a single 4k page */
> +#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
> +#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */

It would be good to split out these as cleanups, I can't see anything
change above, its a little confusing to review it.

> +#define H_PAGE_F_SECOND  _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT   56
> +
> +#define H_PAGE_BUSY  _RPAGE_RPN42 /* software: PTE & hash are busy */
> 

Re: [RFC v3 01/23] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-06-22 Thread Balbir Singh
On Wed, 2017-06-21 at 18:39 -0700, Ram Pai wrote:
> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> in the 4K backed HPTE pages. These bits continue to be used
> for 64K backed HPTE pages in this patch,  but will be freed
> up in the next patch. The  bit  numbers  are big-endian  as
> defined in the ISA3.0
> 
> The patch does the following change to the 64K PTE format
>

Why can't we stuff the bits in the VMA and retrieve it from there?
Basically always get a minor fault in hash and for keys handle
the fault in do_page_fault() and handle the keys from the VMA?
 
> H_PAGE_BUSY moves from bit 3 to bit 9
> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
>   of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
>   second part of the pte.
> 
> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> is  initialized  to  0xF  indicating  an invalid  slot.  If  a HPTE
> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> released immediately. In  other  words, even  though   0xF   is   a
> valid slot we discard  and consider it as an invalid
> slot;i.e HPTE(). This  gives  us  an opportunity to not
> depend on a bit in the primary PTE in order to determine the
> validity of a slot.

This is not clear, could you please rephrase? What is the bit in the
primary key we rely on?

> 
> When  we  release  aHPTE   in the 0xF   slot we also   release a
> legitimate primary   slot  andunmapthat  entry. This  is  to
> ensure  that we do get a   legimate   non-0xF  slot the next time we
> retry for a slot.
> 
> Though treating 0xF slot as invalid reduces the number of available
> slots  and  may  have an effect  on the performance, the probabilty
> of hitting a 0xF is extermely low.
> 
> Compared  to the current scheme, the above described scheme reduces
> the number of false hash table updates  significantly  and  has the
> added  advantage  of  releasing  four  valuable  PTE bits for other
> purpose.
> 
> This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> Ellermen and myself.
>

It would be helpful if you had a text diagram explaining the PTE bits
before and after.
 
> 4K PTE format remain unchanged currently.
>

The code seems to be doing a lot more than the changelog suggests. A few
functions are completely removed, common code between 64K and 4K has been
split under #ifndef. It would be good to call all of these out.
 
> Signed-off-by: Ram Pai 
> 
> Conflicts:
>   arch/powerpc/include/asm/book3s/64/hash.h
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 17 ---
>  arch/powerpc/include/asm/book3s/64/hash.h | 12 +++--
>  arch/powerpc/mm/hash64_64k.c  | 70 
> +++
>  arch/powerpc/mm/hash_utils_64.c   |  4 +-
>  5 files changed, 66 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index b4b5e6b..9c2c8f1 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,13 @@
>  #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
>  
> +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT 56
> +
> +#define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated HPTE */
> +
>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
>H_PAGE_F_SECOND | H_PAGE_F_GIX)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 9732837..3f49941 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -10,20 +10,21 @@
>   * 64k aligned address free up few of the lower bits of RPN for us
>   * We steal that here. For more deatils look at pte_pfn/pfn_pte()
>   */
> -#define H_PAGE_COMBO _RPAGE_RPN0 /* this is a combo 4k page */
> -#define H_PAGE_4K_PFN_RPAGE_RPN1 /* PFN is for a single 4k page */
> +#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
> +#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */

It would be good to split out these as cleanups, I can't see anything
change above, its a little confusing to review it.

> +#define H_PAGE_F_SECOND  _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT   56
> +
> +#define H_PAGE_BUSY  _RPAGE_RPN42 /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE 

[RFC v3 01/23] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-06-21 Thread Ram Pai
Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
in the 4K backed HPTE pages. These bits continue to be used
for 64K backed HPTE pages in this patch,  but will be freed
up in the next patch. The  bit  numbers  are big-endian  as
defined in the ISA3.0

The patch does the following change to the 64K PTE format

H_PAGE_BUSY moves from bit 3 to bit 9
H_PAGE_F_SECOND which occupied bit 4 moves to the second part
of the pte.
H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
second part of the pte.

the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
is  initialized  to  0xF  indicating  an invalid  slot.  If  a HPTE
gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
released immediately. In  other  words, even  though   0xF   is   a
valid slot we discard  and consider it as an invalid
slot;i.e HPTE(). This  gives  us  an opportunity to not
depend on a bit in the primary PTE in order to determine the
validity of a slot.

When  we  release  aHPTE   in the 0xF   slot we also   release a
legitimate primary   slot  andunmapthat  entry. This  is  to
ensure  that we do get a   legimate   non-0xF  slot the next time we
retry for a slot.

Though treating 0xF slot as invalid reduces the number of available
slots  and  may  have an effect  on the performance, the probabilty
of hitting a 0xF is extermely low.

Compared  to the current scheme, the above described scheme reduces
the number of false hash table updates  significantly  and  has the
added  advantage  of  releasing  four  valuable  PTE bits for other
purpose.

This idea was jointly developed by Paul Mackerras, Aneesh, Michael
Ellermen and myself.

4K PTE format remain unchanged currently.

Signed-off-by: Ram Pai 

Conflicts:
arch/powerpc/include/asm/book3s/64/hash.h
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 17 ---
 arch/powerpc/include/asm/book3s/64/hash.h | 12 +++--
 arch/powerpc/mm/hash64_64k.c  | 70 +++
 arch/powerpc/mm/hash_utils_64.c   |  4 +-
 5 files changed, 66 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index b4b5e6b..9c2c8f1 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -16,6 +16,13 @@
 #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
 #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 
+#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
+#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
+#define H_PAGE_F_GIX_SHIFT 56
+
+#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are busy */
+#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
+
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
 H_PAGE_F_SECOND | H_PAGE_F_GIX)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 9732837..3f49941 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -10,20 +10,21 @@
  * 64k aligned address free up few of the lower bits of RPN for us
  * We steal that here. For more deatils look at pte_pfn/pfn_pte()
  */
-#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
-#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
+#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
+#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
+#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
+#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
+#define H_PAGE_F_GIX_SHIFT 56
+
+#define H_PAGE_BUSY_RPAGE_RPN42 /* software: PTE & hash are busy */
+#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
+
 /*
  * We need to differentiate between explicit huge page and THP huge
  * page, since THP huge page also need to track real subpage details
  */
 #define H_PAGE_THP_HUGE  H_PAGE_4K_PFN
 
-/*
- * Used to track subpage group valid if H_PAGE_COMBO is set
- * This overloads H_PAGE_F_GIX and H_PAGE_F_SECOND
- */
-#define H_PAGE_COMBO_VALID (H_PAGE_F_GIX | H_PAGE_F_SECOND)
-
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_F_SECOND | \
 H_PAGE_F_GIX | H_PAGE_HASHPTE | H_PAGE_COMBO)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 4e957b0..ac049de 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -8,11 +8,8 @@
  *
  */
 #define H_PTE_NONE_MASK_PAGE_HPTEFLAGS

[RFC v3 01/23] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-06-21 Thread Ram Pai
Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
in the 4K backed HPTE pages. These bits continue to be used
for 64K backed HPTE pages in this patch,  but will be freed
up in the next patch. The  bit  numbers  are big-endian  as
defined in the ISA3.0

The patch does the following change to the 64K PTE format

H_PAGE_BUSY moves from bit 3 to bit 9
H_PAGE_F_SECOND which occupied bit 4 moves to the second part
of the pte.
H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
second part of the pte.

the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
is  initialized  to  0xF  indicating  an invalid  slot.  If  a HPTE
gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
released immediately. In  other  words, even  though   0xF   is   a
valid slot we discard  and consider it as an invalid
slot;i.e HPTE(). This  gives  us  an opportunity to not
depend on a bit in the primary PTE in order to determine the
validity of a slot.

When  we  release  aHPTE   in the 0xF   slot we also   release a
legitimate primary   slot  andunmapthat  entry. This  is  to
ensure  that we do get a   legimate   non-0xF  slot the next time we
retry for a slot.

Though treating 0xF slot as invalid reduces the number of available
slots  and  may  have an effect  on the performance, the probabilty
of hitting a 0xF is extermely low.

Compared  to the current scheme, the above described scheme reduces
the number of false hash table updates  significantly  and  has the
added  advantage  of  releasing  four  valuable  PTE bits for other
purpose.

This idea was jointly developed by Paul Mackerras, Aneesh, Michael
Ellermen and myself.

4K PTE format remain unchanged currently.

Signed-off-by: Ram Pai 

Conflicts:
arch/powerpc/include/asm/book3s/64/hash.h
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 17 ---
 arch/powerpc/include/asm/book3s/64/hash.h | 12 +++--
 arch/powerpc/mm/hash64_64k.c  | 70 +++
 arch/powerpc/mm/hash_utils_64.c   |  4 +-
 5 files changed, 66 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index b4b5e6b..9c2c8f1 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -16,6 +16,13 @@
 #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
 #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 
+#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
+#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
+#define H_PAGE_F_GIX_SHIFT 56
+
+#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are busy */
+#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
+
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
 H_PAGE_F_SECOND | H_PAGE_F_GIX)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 9732837..3f49941 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -10,20 +10,21 @@
  * 64k aligned address free up few of the lower bits of RPN for us
  * We steal that here. For more deatils look at pte_pfn/pfn_pte()
  */
-#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
-#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
+#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
+#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
+#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
+#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
+#define H_PAGE_F_GIX_SHIFT 56
+
+#define H_PAGE_BUSY_RPAGE_RPN42 /* software: PTE & hash are busy */
+#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
+
 /*
  * We need to differentiate between explicit huge page and THP huge
  * page, since THP huge page also need to track real subpage details
  */
 #define H_PAGE_THP_HUGE  H_PAGE_4K_PFN
 
-/*
- * Used to track subpage group valid if H_PAGE_COMBO is set
- * This overloads H_PAGE_F_GIX and H_PAGE_F_SECOND
- */
-#define H_PAGE_COMBO_VALID (H_PAGE_F_GIX | H_PAGE_F_SECOND)
-
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_F_SECOND | \
 H_PAGE_F_GIX | H_PAGE_HASHPTE | H_PAGE_COMBO)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 4e957b0..ac049de 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -8,11 +8,8 @@
  *
  */
 #define H_PTE_NONE_MASK_PAGE_HPTEFLAGS
-#define