Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-21 Thread Christophe Leroy



Le 19/09/2016 à 07:45, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Today there are two implementations of hugetlbpages which are managed
by exclusive #ifdefs:
* FSL_BOOKE: several directory entries points to the same single hugepage
* BOOK3S: one upper level directory entry points to a table of hugepages

In preparation of implementation of hugepage support on the 8xx, we
need a mix of the two above solutions, because the 8xx needs both cases
depending on the size of pages:
* In 4k page size mode, each PGD entry covers a 4M bytes area. It means
that 2 PGD entries will be necessary to cover an 8M hugepage while a
single PGD entry will cover 8x 512k hugepages.
* In 16 page size mode, each PGD entry covers a 64M bytes area. It means
that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
hugepages will be covers by one PGD entry.

This patch:
* removes #ifdefs in favor of if/else based on the range sizes
* merges the two huge_pte_alloc() functions as they are pretty similar
* merges the two hugetlbpage_init() functions as they are pretty similar

Signed-off-by: Christophe Leroy 
---
v2: This part is new and results from a split of last patch of v1 serie in
two parts

 arch/powerpc/mm/hugetlbpage.c | 189 +-
 1 file changed, 77 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8a512b1..2119f00 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -64,14 +64,16 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 {
struct kmem_cache *cachep;
pte_t *new;
-
-#ifdef CONFIG_PPC_FSL_BOOK3E
int i;
-   int num_hugepd = 1 << (pshift - pdshift);
-   cachep = hugepte_cache;
-#else
-   cachep = PGT_CACHE(pdshift - pshift);
-#endif
+   int num_hugepd;
+
+   if (pshift >= pdshift) {
+   cachep = hugepte_cache;
+   num_hugepd = 1 << (pshift - pdshift);
+   } else {
+   cachep = PGT_CACHE(pdshift - pshift);
+   num_hugepd = 1;
+   }


Is there a way to hint likely/unlikely branch based on the page size
selected at build time ?


Is that really worth it, won't it be negligeable compared to  other 
actions in that function (like for instance kmem_cache_zalloc()) ?

Can't we just trust GCC on that one ?

Christophe


Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-21 Thread Christophe Leroy



Le 19/09/2016 à 07:45, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Today there are two implementations of hugetlbpages which are managed
by exclusive #ifdefs:
* FSL_BOOKE: several directory entries points to the same single hugepage
* BOOK3S: one upper level directory entry points to a table of hugepages

In preparation of implementation of hugepage support on the 8xx, we
need a mix of the two above solutions, because the 8xx needs both cases
depending on the size of pages:
* In 4k page size mode, each PGD entry covers a 4M bytes area. It means
that 2 PGD entries will be necessary to cover an 8M hugepage while a
single PGD entry will cover 8x 512k hugepages.
* In 16 page size mode, each PGD entry covers a 64M bytes area. It means
that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
hugepages will be covers by one PGD entry.

This patch:
* removes #ifdefs in favor of if/else based on the range sizes
* merges the two huge_pte_alloc() functions as they are pretty similar
* merges the two hugetlbpage_init() functions as they are pretty similar

Signed-off-by: Christophe Leroy 
---
v2: This part is new and results from a split of last patch of v1 serie in
two parts

 arch/powerpc/mm/hugetlbpage.c | 189 +-
 1 file changed, 77 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8a512b1..2119f00 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -64,14 +64,16 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 {
struct kmem_cache *cachep;
pte_t *new;
-
-#ifdef CONFIG_PPC_FSL_BOOK3E
int i;
-   int num_hugepd = 1 << (pshift - pdshift);
-   cachep = hugepte_cache;
-#else
-   cachep = PGT_CACHE(pdshift - pshift);
-#endif
+   int num_hugepd;
+
+   if (pshift >= pdshift) {
+   cachep = hugepte_cache;
+   num_hugepd = 1 << (pshift - pdshift);
+   } else {
+   cachep = PGT_CACHE(pdshift - pshift);
+   num_hugepd = 1;
+   }


Is there a way to hint likely/unlikely branch based on the page size
selected at build time ?


Is that really worth it, won't it be negligeable compared to  other 
actions in that function (like for instance kmem_cache_zalloc()) ?

Can't we just trust GCC on that one ?

Christophe


Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread Christophe Leroy



Le 20/09/2016 à 04:28, Aneesh Kumar K.V a écrit :

christophe leroy  writes:


Le 19/09/2016 à 07:50, Aneesh Kumar K.V a écrit :


Christophe Leroy  writes:

+#else
+static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
+{
+   BUG();
+}
+
 #endif



I was expecting that BUG will get removed in the next patch. But I don't
see it in the next patch. Considering

@@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
for (i = 0; i < num_hugepd; i++, hpdp++)
hpdp->pd = 0;

-#ifdef CONFIG_PPC_FSL_BOOK3E
-   hugepd_free(tlb, hugepte);
-#else
-   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
-#endif
+   if (shift >= pdshift)
+   hugepd_free(tlb, hugepte);
+   else
+   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }

What is that I am missing ?



Previously, call to hugepd_free() was compiled only when #ifdef
CONFIG_PPC_FSL_BOOK3E
Now, it is compiled at all time, but it should never be called if not
CONFIG_PPC_FSL_BOOK3E because we always have shift < pdshift in that case.
Then the function needs to be defined anyway but should never be called.
Should I just define it static inline {} ?



For 8M with 4K mode, we have shift >= pdshift right ?



Yes, thats the reason why in the following patch we get. That way we get 
a real hugepd_free() also for the 8xx.


@@ -366,7 +373,7 @@ int alloc_bootmem_huge_page(struct hstate *hstate)
 }
 #endif

-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 #define HUGEPD_FREELIST_SIZE \
((PAGE_SIZE - sizeof(struct hugepd_freelist)) / sizeof(pte_t))



Christophe


Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread Christophe Leroy



Le 20/09/2016 à 04:28, Aneesh Kumar K.V a écrit :

christophe leroy  writes:


Le 19/09/2016 à 07:50, Aneesh Kumar K.V a écrit :


Christophe Leroy  writes:

+#else
+static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
+{
+   BUG();
+}
+
 #endif



I was expecting that BUG will get removed in the next patch. But I don't
see it in the next patch. Considering

@@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
for (i = 0; i < num_hugepd; i++, hpdp++)
hpdp->pd = 0;

-#ifdef CONFIG_PPC_FSL_BOOK3E
-   hugepd_free(tlb, hugepte);
-#else
-   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
-#endif
+   if (shift >= pdshift)
+   hugepd_free(tlb, hugepte);
+   else
+   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }

What is that I am missing ?



Previously, call to hugepd_free() was compiled only when #ifdef
CONFIG_PPC_FSL_BOOK3E
Now, it is compiled at all time, but it should never be called if not
CONFIG_PPC_FSL_BOOK3E because we always have shift < pdshift in that case.
Then the function needs to be defined anyway but should never be called.
Should I just define it static inline {} ?



For 8M with 4K mode, we have shift >= pdshift right ?



Yes, thats the reason why in the following patch we get. That way we get 
a real hugepd_free() also for the 8xx.


@@ -366,7 +373,7 @@ int alloc_bootmem_huge_page(struct hstate *hstate)
 }
 #endif

-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 #define HUGEPD_FREELIST_SIZE \
((PAGE_SIZE - sizeof(struct hugepd_freelist)) / sizeof(pte_t))



Christophe


Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread Aneesh Kumar K.V
christophe leroy  writes:

>>
>>
>>> for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
>>> unsigned shift;
>>> unsigned pdshift;
>>> @@ -860,16 +807,31 @@ static int __init hugetlbpage_init(void)
>>>  * if we have pdshift and shift value same, we don't
>>>  * use pgt cache for hugepd.
>>>  */
>>> -   if (pdshift != shift) {
>>> +   if (pdshift > shift) {
>>> pgtable_cache_add(pdshift - shift, NULL);
>>> if (!PGT_CACHE(pdshift - shift))
>>> panic("hugetlbpage_init(): could not create "
>>>   "pgtable cache for %d bit pagesize\n", 
>>> shift);
>>> +   } else if (!hugepte_cache) {
>>> +   /*
>>> +* Create a kmem cache for hugeptes.  The bottom bits in
>>> +* the pte have size information encoded in them, so
>>> +* align them to allow this
>>> +*/
>>> +   hugepte_cache = kmem_cache_create("hugepte-cache",
>>> + sizeof(pte_t),
>>> + HUGEPD_SHIFT_MASK + 1,
>>> + 0, NULL);
>>> +   if (hugepte_cache == NULL)
>>> +   panic("%s: Unable to create kmem cache "
>>> + "for hugeptes\n", __func__);
>>> +
>>
>>
>> We don't need hugepte_cache for book3s 64K. I guess we will endup
>> creating one here ?
>
> Should not, because on book3s 64k, we will have pdshift > shift
> won't we ?
>

on 64k book3s, we have pdshift == shift and we don't need to create 
hugepd cache on book3s 64k.

-aneesh



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread Aneesh Kumar K.V
christophe leroy  writes:

>>
>>
>>> for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
>>> unsigned shift;
>>> unsigned pdshift;
>>> @@ -860,16 +807,31 @@ static int __init hugetlbpage_init(void)
>>>  * if we have pdshift and shift value same, we don't
>>>  * use pgt cache for hugepd.
>>>  */
>>> -   if (pdshift != shift) {
>>> +   if (pdshift > shift) {
>>> pgtable_cache_add(pdshift - shift, NULL);
>>> if (!PGT_CACHE(pdshift - shift))
>>> panic("hugetlbpage_init(): could not create "
>>>   "pgtable cache for %d bit pagesize\n", 
>>> shift);
>>> +   } else if (!hugepte_cache) {
>>> +   /*
>>> +* Create a kmem cache for hugeptes.  The bottom bits in
>>> +* the pte have size information encoded in them, so
>>> +* align them to allow this
>>> +*/
>>> +   hugepte_cache = kmem_cache_create("hugepte-cache",
>>> + sizeof(pte_t),
>>> + HUGEPD_SHIFT_MASK + 1,
>>> + 0, NULL);
>>> +   if (hugepte_cache == NULL)
>>> +   panic("%s: Unable to create kmem cache "
>>> + "for hugeptes\n", __func__);
>>> +
>>
>>
>> We don't need hugepte_cache for book3s 64K. I guess we will endup
>> creating one here ?
>
> Should not, because on book3s 64k, we will have pdshift > shift
> won't we ?
>

on 64k book3s, we have pdshift == shift and we don't need to create 
hugepd cache on book3s 64k.

-aneesh



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread Aneesh Kumar K.V
christophe leroy  writes:

> Le 19/09/2016 à 07:50, Aneesh Kumar K.V a écrit :
>>
>> Christophe Leroy  writes:
>>> +#else
>>> +static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
>>> +{
>>> +   BUG();
>>> +}
>>> +
>>>  #endif
>>
>>
>> I was expecting that BUG will get removed in the next patch. But I don't
>> see it in the next patch. Considering
>>
>> @@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
>> hugepd_t *hpdp, int pdshif
>> for (i = 0; i < num_hugepd; i++, hpdp++)
>> hpdp->pd = 0;
>>
>> -#ifdef CONFIG_PPC_FSL_BOOK3E
>> -hugepd_free(tlb, hugepte);
>> -#else
>> -pgtable_free_tlb(tlb, hugepte, pdshift - shift);
>> -#endif
>> +if (shift >= pdshift)
>> +hugepd_free(tlb, hugepte);
>> +else
>> +pgtable_free_tlb(tlb, hugepte, pdshift - shift);
>>  }
>>
>> What is that I am missing ?
>>
>
> Previously, call to hugepd_free() was compiled only when #ifdef 
> CONFIG_PPC_FSL_BOOK3E
> Now, it is compiled at all time, but it should never be called if not 
> CONFIG_PPC_FSL_BOOK3E because we always have shift < pdshift in that case.
> Then the function needs to be defined anyway but should never be called. 
> Should I just define it static inline {} ?
>

For 8M with 4K mode, we have shift >= pdshift right ?

-aneesh



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread Aneesh Kumar K.V
christophe leroy  writes:

> Le 19/09/2016 à 07:50, Aneesh Kumar K.V a écrit :
>>
>> Christophe Leroy  writes:
>>> +#else
>>> +static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
>>> +{
>>> +   BUG();
>>> +}
>>> +
>>>  #endif
>>
>>
>> I was expecting that BUG will get removed in the next patch. But I don't
>> see it in the next patch. Considering
>>
>> @@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
>> hugepd_t *hpdp, int pdshif
>> for (i = 0; i < num_hugepd; i++, hpdp++)
>> hpdp->pd = 0;
>>
>> -#ifdef CONFIG_PPC_FSL_BOOK3E
>> -hugepd_free(tlb, hugepte);
>> -#else
>> -pgtable_free_tlb(tlb, hugepte, pdshift - shift);
>> -#endif
>> +if (shift >= pdshift)
>> +hugepd_free(tlb, hugepte);
>> +else
>> +pgtable_free_tlb(tlb, hugepte, pdshift - shift);
>>  }
>>
>> What is that I am missing ?
>>
>
> Previously, call to hugepd_free() was compiled only when #ifdef 
> CONFIG_PPC_FSL_BOOK3E
> Now, it is compiled at all time, but it should never be called if not 
> CONFIG_PPC_FSL_BOOK3E because we always have shift < pdshift in that case.
> Then the function needs to be defined anyway but should never be called. 
> Should I just define it static inline {} ?
>

For 8M with 4K mode, we have shift >= pdshift right ?

-aneesh



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread christophe leroy



Le 19/09/2016 à 07:50, Aneesh Kumar K.V a écrit :


Christophe Leroy  writes:

+#else
+static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
+{
+   BUG();
+}
+
 #endif



I was expecting that BUG will get removed in the next patch. But I don't
see it in the next patch. Considering

@@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
for (i = 0; i < num_hugepd; i++, hpdp++)
hpdp->pd = 0;

-#ifdef CONFIG_PPC_FSL_BOOK3E
-   hugepd_free(tlb, hugepte);
-#else
-   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
-#endif
+   if (shift >= pdshift)
+   hugepd_free(tlb, hugepte);
+   else
+   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }

What is that I am missing ?



Previously, call to hugepd_free() was compiled only when #ifdef 
CONFIG_PPC_FSL_BOOK3E
Now, it is compiled at all time, but it should never be called if not 
CONFIG_PPC_FSL_BOOK3E because we always have shift < pdshift in that case.
Then the function needs to be defined anyway but should never be called. 
Should I just define it static inline {} ?


Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread christophe leroy



Le 19/09/2016 à 07:50, Aneesh Kumar K.V a écrit :


Christophe Leroy  writes:

+#else
+static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
+{
+   BUG();
+}
+
 #endif



I was expecting that BUG will get removed in the next patch. But I don't
see it in the next patch. Considering

@@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
for (i = 0; i < num_hugepd; i++, hpdp++)
hpdp->pd = 0;

-#ifdef CONFIG_PPC_FSL_BOOK3E
-   hugepd_free(tlb, hugepte);
-#else
-   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
-#endif
+   if (shift >= pdshift)
+   hugepd_free(tlb, hugepte);
+   else
+   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }

What is that I am missing ?



Previously, call to hugepd_free() was compiled only when #ifdef 
CONFIG_PPC_FSL_BOOK3E
Now, it is compiled at all time, but it should never be called if not 
CONFIG_PPC_FSL_BOOK3E because we always have shift < pdshift in that case.
Then the function needs to be defined anyway but should never be called. 
Should I just define it static inline {} ?


Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread christophe leroy



Le 19/09/2016 à 07:45, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Today there are two implementations of hugetlbpages which are managed
by exclusive #ifdefs:
* FSL_BOOKE: several directory entries points to the same single hugepage
* BOOK3S: one upper level directory entry points to a table of hugepages

In preparation of implementation of hugepage support on the 8xx, we
need a mix of the two above solutions, because the 8xx needs both cases
depending on the size of pages:
* In 4k page size mode, each PGD entry covers a 4M bytes area. It means
that 2 PGD entries will be necessary to cover an 8M hugepage while a
single PGD entry will cover 8x 512k hugepages.
* In 16 page size mode, each PGD entry covers a 64M bytes area. It means
that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
hugepages will be covers by one PGD entry.

This patch:
* removes #ifdefs in favor of if/else based on the range sizes
* merges the two huge_pte_alloc() functions as they are pretty similar
* merges the two hugetlbpage_init() functions as they are pretty similar

Signed-off-by: Christophe Leroy 
---
v2: This part is new and results from a split of last patch of v1 serie in
two parts

 arch/powerpc/mm/hugetlbpage.c | 189 +-
 1 file changed, 77 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8a512b1..2119f00 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c


[...]




-#ifdef CONFIG_PPC_FSL_BOOK3E
 struct kmem_cache *hugepte_cache;
 static int __init hugetlbpage_init(void)
 {
int psize;

-   for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
-   unsigned shift;
-
-   if (!mmu_psize_defs[psize].shift)
-   continue;
-
-   shift = mmu_psize_to_shift(psize);
-
-   /* Don't treat normal page sizes as huge... */
-   if (shift != PAGE_SHIFT)
-   if (add_huge_page_size(1ULL << shift) < 0)
-   continue;
-   }
-
-   /*
-* Create a kmem cache for hugeptes.  The bottom bits in the pte have
-* size information encoded in them, so align them to allow this
-*/
-   hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
-  HUGEPD_SHIFT_MASK + 1, 0, NULL);
-   if (hugepte_cache == NULL)
-   panic("%s: Unable to create kmem cache for hugeptes\n",
- __func__);
-
-   /* Default hpage size = 4M */
-   if (mmu_psize_defs[MMU_PAGE_4M].shift)
-   HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
-   else
-   panic("%s: Unable to set default huge page size\n", __func__);
-
-
-   return 0;
-}
-#else
-static int __init hugetlbpage_init(void)
-{
-   int psize;
-
+#if !defined(CONFIG_PPC_FSL_BOOK3E)
if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE))
return -ENODEV;
-
+#endif


Do we need that #if ? radix_enabled() should become 0 and that if
condition should be removed at compile time isn't it ? or are you
finding errors with that ?


Having radix_enabled() being 0, it becomes:

if (!mmu_has_feature(MMU_FTR_16M_PAGE))
return -ENODEV;

Which means hugepage will only be handled by CPUs having 16M pages. 
That's the issue.






for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
unsigned shift;
unsigned pdshift;
@@ -860,16 +807,31 @@ static int __init hugetlbpage_init(void)
 * if we have pdshift and shift value same, we don't
 * use pgt cache for hugepd.
 */
-   if (pdshift != shift) {
+   if (pdshift > shift) {
pgtable_cache_add(pdshift - shift, NULL);
if (!PGT_CACHE(pdshift - shift))
panic("hugetlbpage_init(): could not create "
  "pgtable cache for %d bit pagesize\n", 
shift);
+   } else if (!hugepte_cache) {
+   /*
+* Create a kmem cache for hugeptes.  The bottom bits in
+* the pte have size information encoded in them, so
+* align them to allow this
+*/
+   hugepte_cache = kmem_cache_create("hugepte-cache",
+ sizeof(pte_t),
+ HUGEPD_SHIFT_MASK + 1,
+ 0, NULL);
+   if (hugepte_cache == NULL)
+   panic("%s: Unable to create kmem cache "
+ "for hugeptes\n", __func__);
+



We don't need 

Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-19 Thread christophe leroy



Le 19/09/2016 à 07:45, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Today there are two implementations of hugetlbpages which are managed
by exclusive #ifdefs:
* FSL_BOOKE: several directory entries points to the same single hugepage
* BOOK3S: one upper level directory entry points to a table of hugepages

In preparation of implementation of hugepage support on the 8xx, we
need a mix of the two above solutions, because the 8xx needs both cases
depending on the size of pages:
* In 4k page size mode, each PGD entry covers a 4M bytes area. It means
that 2 PGD entries will be necessary to cover an 8M hugepage while a
single PGD entry will cover 8x 512k hugepages.
* In 16 page size mode, each PGD entry covers a 64M bytes area. It means
that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
hugepages will be covers by one PGD entry.

This patch:
* removes #ifdefs in favor of if/else based on the range sizes
* merges the two huge_pte_alloc() functions as they are pretty similar
* merges the two hugetlbpage_init() functions as they are pretty similar

Signed-off-by: Christophe Leroy 
---
v2: This part is new and results from a split of last patch of v1 serie in
two parts

 arch/powerpc/mm/hugetlbpage.c | 189 +-
 1 file changed, 77 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8a512b1..2119f00 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c


[...]




-#ifdef CONFIG_PPC_FSL_BOOK3E
 struct kmem_cache *hugepte_cache;
 static int __init hugetlbpage_init(void)
 {
int psize;

-   for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
-   unsigned shift;
-
-   if (!mmu_psize_defs[psize].shift)
-   continue;
-
-   shift = mmu_psize_to_shift(psize);
-
-   /* Don't treat normal page sizes as huge... */
-   if (shift != PAGE_SHIFT)
-   if (add_huge_page_size(1ULL << shift) < 0)
-   continue;
-   }
-
-   /*
-* Create a kmem cache for hugeptes.  The bottom bits in the pte have
-* size information encoded in them, so align them to allow this
-*/
-   hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
-  HUGEPD_SHIFT_MASK + 1, 0, NULL);
-   if (hugepte_cache == NULL)
-   panic("%s: Unable to create kmem cache for hugeptes\n",
- __func__);
-
-   /* Default hpage size = 4M */
-   if (mmu_psize_defs[MMU_PAGE_4M].shift)
-   HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
-   else
-   panic("%s: Unable to set default huge page size\n", __func__);
-
-
-   return 0;
-}
-#else
-static int __init hugetlbpage_init(void)
-{
-   int psize;
-
+#if !defined(CONFIG_PPC_FSL_BOOK3E)
if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE))
return -ENODEV;
-
+#endif


Do we need that #if ? radix_enabled() should become 0 and that if
condition should be removed at compile time isn't it ? or are you
finding errors with that ?


Having radix_enabled() being 0, it becomes:

if (!mmu_has_feature(MMU_FTR_16M_PAGE))
return -ENODEV;

Which means hugepage will only be handled by CPUs having 16M pages. 
That's the issue.






for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
unsigned shift;
unsigned pdshift;
@@ -860,16 +807,31 @@ static int __init hugetlbpage_init(void)
 * if we have pdshift and shift value same, we don't
 * use pgt cache for hugepd.
 */
-   if (pdshift != shift) {
+   if (pdshift > shift) {
pgtable_cache_add(pdshift - shift, NULL);
if (!PGT_CACHE(pdshift - shift))
panic("hugetlbpage_init(): could not create "
  "pgtable cache for %d bit pagesize\n", 
shift);
+   } else if (!hugepte_cache) {
+   /*
+* Create a kmem cache for hugeptes.  The bottom bits in
+* the pte have size information encoded in them, so
+* align them to allow this
+*/
+   hugepte_cache = kmem_cache_create("hugepte-cache",
+ sizeof(pte_t),
+ HUGEPD_SHIFT_MASK + 1,
+ 0, NULL);
+   if (hugepte_cache == NULL)
+   panic("%s: Unable to create kmem cache "
+ "for hugeptes\n", __func__);
+



We don't need hugepte_cache for book3s 64K. I guess we will endup

Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-18 Thread Aneesh Kumar K.V

Christophe Leroy  writes:
> +#else
> +static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
> +{
> + BUG();
> +}
> +
>  #endif


I was expecting that BUG will get removed in the next patch. But I don't
see it in the next patch. Considering

@@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
for (i = 0; i < num_hugepd; i++, hpdp++)
hpdp->pd = 0;

-#ifdef CONFIG_PPC_FSL_BOOK3E
-   hugepd_free(tlb, hugepte);
-#else
-   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
-#endif
+   if (shift >= pdshift)
+   hugepd_free(tlb, hugepte);
+   else
+   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }

What is that I am missing ?

-aneesh



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-18 Thread Aneesh Kumar K.V

Christophe Leroy  writes:
> +#else
> +static void hugepd_free(struct mmu_gather *tlb, void *hugepte)
> +{
> + BUG();
> +}
> +
>  #endif


I was expecting that BUG will get removed in the next patch. But I don't
see it in the next patch. Considering

@@ -475,11 +453,10 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
for (i = 0; i < num_hugepd; i++, hpdp++)
hpdp->pd = 0;

-#ifdef CONFIG_PPC_FSL_BOOK3E
-   hugepd_free(tlb, hugepte);
-#else
-   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
-#endif
+   if (shift >= pdshift)
+   hugepd_free(tlb, hugepte);
+   else
+   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }

What is that I am missing ?

-aneesh



Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-18 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> Today there are two implementations of hugetlbpages which are managed
> by exclusive #ifdefs:
> * FSL_BOOKE: several directory entries points to the same single hugepage
> * BOOK3S: one upper level directory entry points to a table of hugepages
>
> In preparation of implementation of hugepage support on the 8xx, we
> need a mix of the two above solutions, because the 8xx needs both cases
> depending on the size of pages:
> * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
> that 2 PGD entries will be necessary to cover an 8M hugepage while a
> single PGD entry will cover 8x 512k hugepages.
> * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
> that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
> hugepages will be covers by one PGD entry.
>
> This patch:
> * removes #ifdefs in favor of if/else based on the range sizes
> * merges the two huge_pte_alloc() functions as they are pretty similar
> * merges the two hugetlbpage_init() functions as they are pretty similar
>
> Signed-off-by: Christophe Leroy 
> ---
> v2: This part is new and results from a split of last patch of v1 serie in
> two parts
>
>  arch/powerpc/mm/hugetlbpage.c | 189 
> +-
>  1 file changed, 77 insertions(+), 112 deletions(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 8a512b1..2119f00 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -64,14 +64,16 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
> *hpdp,
>  {
>   struct kmem_cache *cachep;
>   pte_t *new;
> -
> -#ifdef CONFIG_PPC_FSL_BOOK3E
>   int i;
> - int num_hugepd = 1 << (pshift - pdshift);
> - cachep = hugepte_cache;
> -#else
> - cachep = PGT_CACHE(pdshift - pshift);
> -#endif
> + int num_hugepd;
> +
> + if (pshift >= pdshift) {
> + cachep = hugepte_cache;
> + num_hugepd = 1 << (pshift - pdshift);
> + } else {
> + cachep = PGT_CACHE(pdshift - pshift);
> + num_hugepd = 1;
> + }

Is there a way to hint likely/unlikely branch based on the page size
selected at build time ?



>  
>   new = kmem_cache_zalloc(cachep, GFP_KERNEL);
>  
> @@ -89,7 +91,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
> *hpdp,
>   smp_wmb();
>  
>   spin_lock(>page_table_lock);
> -#ifdef CONFIG_PPC_FSL_BOOK3E
> +
>   /*
>* We have multiple higher-level entries that point to the same
>* actual pte location.  Fill in each as we go and backtrack on error.
> @@ -100,8 +102,13 @@ static int __hugepte_alloc(struct mm_struct *mm, 
> hugepd_t *hpdp,
>   if (unlikely(!hugepd_none(*hpdp)))
>   break;
>   else



> -#ifdef CONFIG_PPC_FSL_BOOK3E
>  struct kmem_cache *hugepte_cache;
>  static int __init hugetlbpage_init(void)
>  {
>   int psize;
>  
> - for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
> - unsigned shift;
> -
> - if (!mmu_psize_defs[psize].shift)
> - continue;
> -
> - shift = mmu_psize_to_shift(psize);
> -
> - /* Don't treat normal page sizes as huge... */
> - if (shift != PAGE_SHIFT)
> - if (add_huge_page_size(1ULL << shift) < 0)
> - continue;
> - }
> -
> - /*
> -  * Create a kmem cache for hugeptes.  The bottom bits in the pte have
> -  * size information encoded in them, so align them to allow this
> -  */
> - hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
> -HUGEPD_SHIFT_MASK + 1, 0, NULL);
> - if (hugepte_cache == NULL)
> - panic("%s: Unable to create kmem cache for hugeptes\n",
> -   __func__);
> -
> - /* Default hpage size = 4M */
> - if (mmu_psize_defs[MMU_PAGE_4M].shift)
> - HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
> - else
> - panic("%s: Unable to set default huge page size\n", __func__);
> -
> -
> - return 0;
> -}
> -#else
> -static int __init hugetlbpage_init(void)
> -{
> - int psize;
> -
> +#if !defined(CONFIG_PPC_FSL_BOOK3E)
>   if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE))
>   return -ENODEV;
> -
> +#endif

Do we need that #if ? radix_enabled() should become 0 and that if
condition should be removed at compile time isn't it ? or are you
finding errors with that ?


>   for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
>   unsigned shift;
>   unsigned pdshift;
> @@ -860,16 +807,31 @@ static int __init hugetlbpage_init(void)
>* if we have pdshift and shift value same, we don't
>* use pgt cache for hugepd.
>*/
> - if (pdshift != 

Re: [PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-18 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> Today there are two implementations of hugetlbpages which are managed
> by exclusive #ifdefs:
> * FSL_BOOKE: several directory entries points to the same single hugepage
> * BOOK3S: one upper level directory entry points to a table of hugepages
>
> In preparation of implementation of hugepage support on the 8xx, we
> need a mix of the two above solutions, because the 8xx needs both cases
> depending on the size of pages:
> * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
> that 2 PGD entries will be necessary to cover an 8M hugepage while a
> single PGD entry will cover 8x 512k hugepages.
> * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
> that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
> hugepages will be covers by one PGD entry.
>
> This patch:
> * removes #ifdefs in favor of if/else based on the range sizes
> * merges the two huge_pte_alloc() functions as they are pretty similar
> * merges the two hugetlbpage_init() functions as they are pretty similar
>
> Signed-off-by: Christophe Leroy 
> ---
> v2: This part is new and results from a split of last patch of v1 serie in
> two parts
>
>  arch/powerpc/mm/hugetlbpage.c | 189 
> +-
>  1 file changed, 77 insertions(+), 112 deletions(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 8a512b1..2119f00 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -64,14 +64,16 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
> *hpdp,
>  {
>   struct kmem_cache *cachep;
>   pte_t *new;
> -
> -#ifdef CONFIG_PPC_FSL_BOOK3E
>   int i;
> - int num_hugepd = 1 << (pshift - pdshift);
> - cachep = hugepte_cache;
> -#else
> - cachep = PGT_CACHE(pdshift - pshift);
> -#endif
> + int num_hugepd;
> +
> + if (pshift >= pdshift) {
> + cachep = hugepte_cache;
> + num_hugepd = 1 << (pshift - pdshift);
> + } else {
> + cachep = PGT_CACHE(pdshift - pshift);
> + num_hugepd = 1;
> + }

Is there a way to hint likely/unlikely branch based on the page size
selected at build time ?



>  
>   new = kmem_cache_zalloc(cachep, GFP_KERNEL);
>  
> @@ -89,7 +91,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
> *hpdp,
>   smp_wmb();
>  
>   spin_lock(>page_table_lock);
> -#ifdef CONFIG_PPC_FSL_BOOK3E
> +
>   /*
>* We have multiple higher-level entries that point to the same
>* actual pte location.  Fill in each as we go and backtrack on error.
> @@ -100,8 +102,13 @@ static int __hugepte_alloc(struct mm_struct *mm, 
> hugepd_t *hpdp,
>   if (unlikely(!hugepd_none(*hpdp)))
>   break;
>   else



> -#ifdef CONFIG_PPC_FSL_BOOK3E
>  struct kmem_cache *hugepte_cache;
>  static int __init hugetlbpage_init(void)
>  {
>   int psize;
>  
> - for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
> - unsigned shift;
> -
> - if (!mmu_psize_defs[psize].shift)
> - continue;
> -
> - shift = mmu_psize_to_shift(psize);
> -
> - /* Don't treat normal page sizes as huge... */
> - if (shift != PAGE_SHIFT)
> - if (add_huge_page_size(1ULL << shift) < 0)
> - continue;
> - }
> -
> - /*
> -  * Create a kmem cache for hugeptes.  The bottom bits in the pte have
> -  * size information encoded in them, so align them to allow this
> -  */
> - hugepte_cache =  kmem_cache_create("hugepte-cache", sizeof(pte_t),
> -HUGEPD_SHIFT_MASK + 1, 0, NULL);
> - if (hugepte_cache == NULL)
> - panic("%s: Unable to create kmem cache for hugeptes\n",
> -   __func__);
> -
> - /* Default hpage size = 4M */
> - if (mmu_psize_defs[MMU_PAGE_4M].shift)
> - HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
> - else
> - panic("%s: Unable to set default huge page size\n", __func__);
> -
> -
> - return 0;
> -}
> -#else
> -static int __init hugetlbpage_init(void)
> -{
> - int psize;
> -
> +#if !defined(CONFIG_PPC_FSL_BOOK3E)
>   if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE))
>   return -ENODEV;
> -
> +#endif

Do we need that #if ? radix_enabled() should become 0 and that if
condition should be removed at compile time isn't it ? or are you
finding errors with that ?


>   for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
>   unsigned shift;
>   unsigned pdshift;
> @@ -860,16 +807,31 @@ static int __init hugetlbpage_init(void)
>* if we have pdshift and shift value same, we don't
>* use pgt cache for hugepd.
>*/
> - if (pdshift != shift) {
> + if (pdshift > shift) {
>   

[PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-16 Thread Christophe Leroy
Today there are two implementations of hugetlbpages which are managed
by exclusive #ifdefs:
* FSL_BOOKE: several directory entries points to the same single hugepage
* BOOK3S: one upper level directory entry points to a table of hugepages

In preparation of implementation of hugepage support on the 8xx, we
need a mix of the two above solutions, because the 8xx needs both cases
depending on the size of pages:
* In 4k page size mode, each PGD entry covers a 4M bytes area. It means
that 2 PGD entries will be necessary to cover an 8M hugepage while a
single PGD entry will cover 8x 512k hugepages.
* In 16 page size mode, each PGD entry covers a 64M bytes area. It means
that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
hugepages will be covers by one PGD entry.

This patch:
* removes #ifdefs in favor of if/else based on the range sizes
* merges the two huge_pte_alloc() functions as they are pretty similar
* merges the two hugetlbpage_init() functions as they are pretty similar

Signed-off-by: Christophe Leroy 
---
v2: This part is new and results from a split of last patch of v1 serie in
two parts

 arch/powerpc/mm/hugetlbpage.c | 189 +-
 1 file changed, 77 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8a512b1..2119f00 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -64,14 +64,16 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 {
struct kmem_cache *cachep;
pte_t *new;
-
-#ifdef CONFIG_PPC_FSL_BOOK3E
int i;
-   int num_hugepd = 1 << (pshift - pdshift);
-   cachep = hugepte_cache;
-#else
-   cachep = PGT_CACHE(pdshift - pshift);
-#endif
+   int num_hugepd;
+
+   if (pshift >= pdshift) {
+   cachep = hugepte_cache;
+   num_hugepd = 1 << (pshift - pdshift);
+   } else {
+   cachep = PGT_CACHE(pdshift - pshift);
+   num_hugepd = 1;
+   }
 
new = kmem_cache_zalloc(cachep, GFP_KERNEL);
 
@@ -89,7 +91,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
smp_wmb();
 
spin_lock(>page_table_lock);
-#ifdef CONFIG_PPC_FSL_BOOK3E
+
/*
 * We have multiple higher-level entries that point to the same
 * actual pte location.  Fill in each as we go and backtrack on error.
@@ -100,8 +102,13 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
if (unlikely(!hugepd_none(*hpdp)))
break;
else
+#ifdef CONFIG_PPC_BOOK3S_64
+   hpdp->pd = __pa(new) |
+  (shift_to_mmu_psize(pshift) << 2);
+#else
/* We use the old format for PPC_FSL_BOOK3E */
hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
+#endif
}
/* If we bailed from the for loop early, an error occurred, clean up */
if (i < num_hugepd) {
@@ -109,17 +116,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
hpdp->pd = 0;
kmem_cache_free(cachep, new);
}
-#else
-   if (!hugepd_none(*hpdp))
-   kmem_cache_free(cachep, new);
-   else {
-#ifdef CONFIG_PPC_BOOK3S_64
-   hpdp->pd = __pa(new) | (shift_to_mmu_psize(pshift) << 2);
-#else
-   hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
-#endif
-   }
-#endif
spin_unlock(>page_table_lock);
return 0;
 }
@@ -136,7 +132,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 #define HUGEPD_PUD_SHIFT PMD_SHIFT
 #endif
 
-#ifdef CONFIG_PPC_BOOK3S_64
 /*
  * At this point we do the placement change only for BOOK3S 64. This would
  * possibly work on other subarchs.
@@ -153,6 +148,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
addr &= ~(sz-1);
pg = pgd_offset(mm, addr);
 
+#ifdef CONFIG_PPC_BOOK3S_64
if (pshift == PGDIR_SHIFT)
/* 16GB huge page */
return (pte_t *) pg;
@@ -178,32 +174,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
hpdp = (hugepd_t *)pm;
}
}
-   if (!hpdp)
-   return NULL;
-
-   BUG_ON(!hugepd_none(*hpdp) && !hugepd_ok(*hpdp));
-
-   if (hugepd_none(*hpdp) && __hugepte_alloc(mm, hpdp, addr, pdshift, 
pshift))
-   return NULL;
-
-   return hugepte_offset(*hpdp, addr, pdshift);
-}
-
 #else
-
-pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long 
sz)
-{
-   pgd_t *pg;
-   pud_t *pu;
-   pmd_t *pm;
-   hugepd_t *hpdp = NULL;
-   unsigned pshift = __ffs(sz);
-   unsigned pdshift = PGDIR_SHIFT;
-
-   addr &= ~(sz-1);
-
-   pg = pgd_offset(mm, addr);
-
 

[PATCH v2 2/3] powerpc: get hugetlbpage handling more generic

2016-09-16 Thread Christophe Leroy
Today there are two implementations of hugetlbpages which are managed
by exclusive #ifdefs:
* FSL_BOOKE: several directory entries points to the same single hugepage
* BOOK3S: one upper level directory entry points to a table of hugepages

In preparation of implementation of hugepage support on the 8xx, we
need a mix of the two above solutions, because the 8xx needs both cases
depending on the size of pages:
* In 4k page size mode, each PGD entry covers a 4M bytes area. It means
that 2 PGD entries will be necessary to cover an 8M hugepage while a
single PGD entry will cover 8x 512k hugepages.
* In 16 page size mode, each PGD entry covers a 64M bytes area. It means
that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
hugepages will be covers by one PGD entry.

This patch:
* removes #ifdefs in favor of if/else based on the range sizes
* merges the two huge_pte_alloc() functions as they are pretty similar
* merges the two hugetlbpage_init() functions as they are pretty similar

Signed-off-by: Christophe Leroy 
---
v2: This part is new and results from a split of last patch of v1 serie in
two parts

 arch/powerpc/mm/hugetlbpage.c | 189 +-
 1 file changed, 77 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8a512b1..2119f00 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -64,14 +64,16 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 {
struct kmem_cache *cachep;
pte_t *new;
-
-#ifdef CONFIG_PPC_FSL_BOOK3E
int i;
-   int num_hugepd = 1 << (pshift - pdshift);
-   cachep = hugepte_cache;
-#else
-   cachep = PGT_CACHE(pdshift - pshift);
-#endif
+   int num_hugepd;
+
+   if (pshift >= pdshift) {
+   cachep = hugepte_cache;
+   num_hugepd = 1 << (pshift - pdshift);
+   } else {
+   cachep = PGT_CACHE(pdshift - pshift);
+   num_hugepd = 1;
+   }
 
new = kmem_cache_zalloc(cachep, GFP_KERNEL);
 
@@ -89,7 +91,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
smp_wmb();
 
spin_lock(>page_table_lock);
-#ifdef CONFIG_PPC_FSL_BOOK3E
+
/*
 * We have multiple higher-level entries that point to the same
 * actual pte location.  Fill in each as we go and backtrack on error.
@@ -100,8 +102,13 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
if (unlikely(!hugepd_none(*hpdp)))
break;
else
+#ifdef CONFIG_PPC_BOOK3S_64
+   hpdp->pd = __pa(new) |
+  (shift_to_mmu_psize(pshift) << 2);
+#else
/* We use the old format for PPC_FSL_BOOK3E */
hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
+#endif
}
/* If we bailed from the for loop early, an error occurred, clean up */
if (i < num_hugepd) {
@@ -109,17 +116,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
hpdp->pd = 0;
kmem_cache_free(cachep, new);
}
-#else
-   if (!hugepd_none(*hpdp))
-   kmem_cache_free(cachep, new);
-   else {
-#ifdef CONFIG_PPC_BOOK3S_64
-   hpdp->pd = __pa(new) | (shift_to_mmu_psize(pshift) << 2);
-#else
-   hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
-#endif
-   }
-#endif
spin_unlock(>page_table_lock);
return 0;
 }
@@ -136,7 +132,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 #define HUGEPD_PUD_SHIFT PMD_SHIFT
 #endif
 
-#ifdef CONFIG_PPC_BOOK3S_64
 /*
  * At this point we do the placement change only for BOOK3S 64. This would
  * possibly work on other subarchs.
@@ -153,6 +148,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
addr &= ~(sz-1);
pg = pgd_offset(mm, addr);
 
+#ifdef CONFIG_PPC_BOOK3S_64
if (pshift == PGDIR_SHIFT)
/* 16GB huge page */
return (pte_t *) pg;
@@ -178,32 +174,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
hpdp = (hugepd_t *)pm;
}
}
-   if (!hpdp)
-   return NULL;
-
-   BUG_ON(!hugepd_none(*hpdp) && !hugepd_ok(*hpdp));
-
-   if (hugepd_none(*hpdp) && __hugepte_alloc(mm, hpdp, addr, pdshift, 
pshift))
-   return NULL;
-
-   return hugepte_offset(*hpdp, addr, pdshift);
-}
-
 #else
-
-pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long 
sz)
-{
-   pgd_t *pg;
-   pud_t *pu;
-   pmd_t *pm;
-   hugepd_t *hpdp = NULL;
-   unsigned pshift = __ffs(sz);
-   unsigned pdshift = PGDIR_SHIFT;
-
-   addr &= ~(sz-1);
-
-   pg = pgd_offset(mm, addr);
-
if (pshift >=