Re: [PATCH V2] powerpc/Kconfig: Update config option based on page size.

2016-02-18 Thread Balbir Singh


On 19/02/16 16:38, Rashmica Gupta wrote:
> Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
> FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.
>
> The error occurs because of the following constraint (from
> include/linux/mmzone.h) being violated:
>
>   MAX_ORDER -1 + PAGESHIFT <= SECTION_SIZE_BITS.
>
> Expanding this out, we get:
>
>   FORCE_MAX_ZONEBITS <= 25 - PAGESHIFT,
>
> which requires, for a 64K page, FORCE_MAX_ZONEBITS <= 9. Thus set max
> value of FORCE_MAX_ZONEORDER for 64K pages to 9, and 4K pages to 13.
>
> Also, check the minimum value:
> In include/linux/huge_mm.h, we have the constraint HPAGE_PMD_ORDER <
> MAX_ORDER which expands out to:
>
>   PTE_INDEX_SIZE < FORCE_MAX_ZONEORDER.
>
> PTE_INDEX_SIZE is:
>   9 (4k hash or no hash 4K pgtable) or
>   8 (64K hash or no hash 64K pgtable).
> Thus a min value of 8 for 64K pages and 9 for 4K pages is reasonable.
>
> So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 8-9 for 64K pages
> and from 13-64 to 9-13 for 4K pages.
>
> Signed-off-by: Rashmica Gupta 
> ---
>
> v2: Changed the range for 4K pages and minimum for 64K pages as suggested
> by Balbir Singh. 
>
>
>  arch/powerpc/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e4824fd04bb7..b933530821fb 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -585,9 +585,9 @@ endchoice
>  
>  config FORCE_MAX_ZONEORDER
>   int "Maximum zone order"
> - range 9 64 if PPC64 && PPC_64K_PAGES
> + range 8 9 if PPC64 && PPC_64K_PAGES
>   default "9" if PPC64 && PPC_64K_PAGES
> - range 13 64 if PPC64 && !PPC_64K_PAGES
> + range 9 13 if PPC64 && !PPC_64K_PAGES
>   default "13" if PPC64 && !PPC_64K_PAGES
>   range 9 64 if PPC32 && PPC_16K_PAGES
>   default "9" if PPC32 && PPC_16K_PAGES
Reviewed-by: Balbir Singh 

Balbir Singh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/mm/hash: Clear the invalid slot information correctly

2016-02-18 Thread Anshuman Khandual
On 02/18/2016 10:14 PM, Aneesh Kumar K.V wrote:
> We can get a hash pte fault with 4k base page size and find the pte
> already inserted with 64K base page size. In that case we need to clear

Can you please elaborate on this ? What are those situations when we
have 64K base page size on the PTE but we had inserted HPTE with base
page size as 4K ?

> the existing slot information from the old pte. Fix this correctly
> 
> With THP, we also clear the slot information with respect to all
> the 64K hash pte mapping that 16MB page. They are all invalid
> now. This make sure we don't find the slot valid when we fault with
> 4k base page size. Finding the slot valid should not result in any wrong
> behavior because we do check again in hash page table for the validity.
> But we can avoid that check completely.

Makes sense.

> 
> Fixes: a43c0eb8364c022 ("powerpc/mm: Convert 4k hash insert to C")
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/hash64_4k.c   |  2 +-
>  arch/powerpc/mm/hash64_64k.c  | 12 +---
>  arch/powerpc/mm/hugepage-hash64.c |  7 ++-
>  3 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
> index e7c04542ba62..e3e76b929f33 100644
> --- a/arch/powerpc/mm/hash64_4k.c
> +++ b/arch/powerpc/mm/hash64_4k.c
> @@ -106,7 +106,7 @@ repeat:
>   }
>   }
>   /*
> -  * Hypervisor failure. Restore old pmd and return -1
> +  * Hypervisor failure. Restore old pte and return -1

This change is not relevant here. Should be a separate patch.

>* similar to __hash_page_*
>*/
>   if (unlikely(slot == -2)) {
> diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
> index 0762c1e08c88..b3895720edb0 100644
> --- a/arch/powerpc/mm/hash64_64k.c
> +++ b/arch/powerpc/mm/hash64_64k.c
> @@ -111,7 +111,13 @@ int __hash_page_4K(unsigned long ea, unsigned long 
> access, unsigned long vsid,
>*/
>   if (!(old_pte & _PAGE_COMBO)) {
>   flush_hash_page(vpn, rpte, MMU_PAGE_64K, ssize, flags);
> - old_pte &= ~_PAGE_HASHPTE | _PAGE_F_GIX | _PAGE_F_SECOND;
> + /*
> +  * clear the old slot details from the old and new pte.
> +  * On hash insert failure we use old pte value and we don't
> +  * want slot information there if we have a insert failure.
> +  */
> + old_pte &= ~(_PAGE_HASHPTE | _PAGE_F_GIX | _PAGE_F_SECOND);
> + new_pte &= ~(_PAGE_HASHPTE | _PAGE_F_GIX | _PAGE_F_SECOND);

But why we need clear the bits on new_pte as well ?

>   goto htab_insert_hpte;
>   }
>   /*
> @@ -182,7 +188,7 @@ repeat:
>   }
>   }
>   /*
> -  * Hypervisor failure. Restore old pmd and return -1
> +  * Hypervisor failure. Restore old pte and return -1

This change is not relevant here. Should be a separate patch.


>* similar to __hash_page_*
>*/
>   if (unlikely(slot == -2)) {
> @@ -305,7 +311,7 @@ repeat:
>   }
>   }
>   /*
> -  * Hypervisor failure. Restore old pmd and return -1
> +  * Hypervisor failure. Restore old pte and return -1
>* similar to __hash_page_*

Ditto.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] powerpc/Kconfig: Update config option based on page size.

2016-02-18 Thread Rashmica Gupta
Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.

The error occurs because of the following constraint (from
include/linux/mmzone.h) being violated:

MAX_ORDER -1 + PAGESHIFT <= SECTION_SIZE_BITS.

Expanding this out, we get:

FORCE_MAX_ZONEBITS <= 25 - PAGESHIFT,

which requires, for a 64K page, FORCE_MAX_ZONEBITS <= 9. Thus set max
value of FORCE_MAX_ZONEORDER for 64K pages to 9, and 4K pages to 13.

Also, check the minimum value:
In include/linux/huge_mm.h, we have the constraint HPAGE_PMD_ORDER <
MAX_ORDER which expands out to:

PTE_INDEX_SIZE < FORCE_MAX_ZONEORDER.

PTE_INDEX_SIZE is:
9 (4k hash or no hash 4K pgtable) or
8 (64K hash or no hash 64K pgtable).
Thus a min value of 8 for 64K pages and 9 for 4K pages is reasonable.

So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 8-9 for 64K pages
and from 13-64 to 9-13 for 4K pages.

Signed-off-by: Rashmica Gupta 
---

v2: Changed the range for 4K pages and minimum for 64K pages as suggested
by Balbir Singh. 


 arch/powerpc/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e4824fd04bb7..b933530821fb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -585,9 +585,9 @@ endchoice
 
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
-   range 9 64 if PPC64 && PPC_64K_PAGES
+   range 8 9 if PPC64 && PPC_64K_PAGES
default "9" if PPC64 && PPC_64K_PAGES
-   range 13 64 if PPC64 && !PPC_64K_PAGES
+   range 9 13 if PPC64 && !PPC_64K_PAGES
default "13" if PPC64 && !PPC_64K_PAGES
range 9 64 if PPC32 && PPC_16K_PAGES
default "9" if PPC32 && PPC_16K_PAGES
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 2/4] powerpc/mm: Add comments to the vmemmap layout

2016-02-18 Thread Anshuman Khandual
On 02/18/2016 07:52 PM, Michael Ellerman wrote:
> On Wed, 2016-02-17 at 17:42 +0530, Anshuman Khandual wrote:
> 
>> Add some explaination to the layout of vmemmap virtual address
>> space and how physical page mapping is only used for valid PFNs
>> present at any point on the system.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h | 41 
>> 
>>  1 file changed, 41 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 8d1c41d..9db4a86 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -26,6 +26,47 @@
>>  #define IOREMAP_BASE(PHB_IO_END)
>>  #define IOREMAP_END (KERN_VIRT_START + KERN_VIRT_SIZE)
>>  
>> +/*
>> + * Starting address of the virtual address space where all page structs
> 
> This is so far from the variable it's referring to that it's not clear what it
> refers to. So you should say "vmemmap is the starting ..."
> 
>> + * for the system physical memory are stored under the vmemmap sparse
>   ^
> , when using the SPARSEMEM_VMEMMAP
>> + * memory model. All possible struct pages are logically stored in a
>> + * sequence in this virtual address space irrespective of the fact
>> + * whether any given PFN is valid or even the memory section is valid
>> + * or not.
> 
> I know what you mean but I think that could be worded better. But it's too 
> late
> for me to reword it :)
> 
> The key point is that we allocate space for a page struct for each PFN that
> could be present in the system, including holes in the address space (hence
> sparse). That has the nice property of meaning there is a constant 
> relationship
> between the address of a struct page and it's PFN.
> 
>> + * During boot and memory hotplug add operation when new memory
>   ^   ^
> or  ,
>> + * sections are added, real physical allocation and hash table bolting
>   ^
> of struct pages
> 
>> + * will be performed. This saves precious physical memory when the system
>> + * really does not have valid PFNs in some address ranges.
> 
> 
>> + *
>> + *  vmemmap +--+
>> + * +|  page struct +--+  PFN is valid
>> + * |+--+  |
>> + * ||  page struct |  |  PFN is invalid
>> + * |+--+  |
>> + * ||  page struct +--+   |
>> + * |+--+  |   |
>> + * ||  page struct |  |   |
>> + * |+--+  |   |
>> + * ||  page struct |  |   |
>> + * |+--+  |   |
>> + * ||  page struct +--+   |   |
>> + * |+--+  |   |   |
>> + * ||  page struct |  |   |   |   +-+
>> + * |+--+  |   |   +-> | PFN |
>> + * ||  page struct |  |   |   +-+
>> + * |+--+  |   +-> | PFN |
>> + * ||  page struct |  |   +-+
>> + * |+--+  +-> | PFN |
>> + * ||  page struct |  +-+
>> + * |+--+   +> | PFN |
>> + * ||  page struct |   |  +-+
>> + * |+--+   |Bolted in hash table
>> + * ||  page struct +---+
>> + * v+--+
> 
> 
> The things on the right are not PFNs, they're struct pages. Each one
> corresponds to a PFN, but that relationship is derived from the vmemap layout,
> not the physical layout.
> 
> I think it's more like:
> 
>   f000  c000 (and also 0x0)
> vmemmap +--+  +--+
>+|  page struct | +--> |  page struct |
>|+--+  +--+
>||  page struct | +--> |  page struct |
>|+--+ |+--+
>||  page struct | +   +--> |  page struct |
>|+--+ |+--+
>||  page struct | |   +--> |  page struct |
>|+--+ |   |+--+
>||  page struct | |   |
>|+--+ |   |
>||  page struct | |   |
>|+--+ |   |
>||  page struct | |   |
>|+--+ |   |
>||  page struct | |   |
>|+--+ |   |
>||  page 

Re: [RFC 4/4] powerpc/mm: Rename global tracker for virtual to physical mapping

2016-02-18 Thread Anshuman Khandual
On 02/18/2016 08:07 PM, Michael Ellerman wrote:
> On Wed, 2016-02-17 at 17:42 +0530, Anshuman Khandual wrote:
> 
>> This renames the global list which tracks all the virtual to physical
>> mapping and also the global list which tracks all the available unused
>> vmemmap_hw_map node structures.
> 
> But why? Why are the new names *so* much better that we would want to go
> through all this churn?

Hmm, okay. Its kind of subjective but then its upto you.

> 
>> It also attempts to explain the purpose
>> of these global linked lists and points out a possible race condition.
> 
> I'm happy to take the comments.

Sure, will send across next time around separately.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 1/4] powerpc/mm: Rename variable to reflect start address of a section

2016-02-18 Thread Anshuman Khandual
On 02/18/2016 08:04 PM, Michael Ellerman wrote:
> On Wed, 2016-02-17 at 17:42 +0530, Anshuman Khandual wrote:
> 
>> The commit (16a05bff1: powerpc: start loop at section start of
>> start in vmemmap_populated()) reused 'start' variable to compute
>> the starting address of the memory section where the given address
>> belongs. Then the same variable is used for iterating over starting
>> address of all memory sections before reaching the 'end' address.
>> Renaming it as 'section_start' makes the logic more clear.
>>
>> Fixes: 16a05bff1 ("powerpc: start loop at section start of start in 
>> vmemmap_populated()")
> 
> It's not a fix, just a cleanup. Fixes lines should be reserved for actual bug
> fixes.

Sure, got it.

> 
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/powerpc/mm/init_64.c | 12 
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index 379a6a9..d6b9b4d 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -170,11 +170,15 @@ static unsigned long __meminit 
>> vmemmap_section_start(unsigned long page)
>>   */
>>  static int __meminit vmemmap_populated(unsigned long start, int page_size)
>>  {
>> -unsigned long end = start + page_size;
>> -start = (unsigned long)(pfn_to_page(vmemmap_section_start(start)));
>> +unsigned long end, section_start;
>>  
>> -for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page)))
>> -if (pfn_valid(page_to_pfn((struct page *)start)))
>> +end = start + page_size;
>> +section_start = (unsigned long)(pfn_to_page
>> +(vmemmap_section_start(start)));
>> +
>> +for (; section_start < end; section_start
>> ++= (PAGES_PER_SECTION * sizeof(struct page)))
>> +if (pfn_valid(page_to_pfn((struct page *)section_start)))
>>  return 1;
>>  
>>  return 0;
> 
> That's not a big improvement.
> 
> But I think this code could be improved. There's a lot of casts, it seems to 
> be
> confused about whether it's iterating over addresses or struct pages.

Right, this patch just tries to clear on such confusion. Thats all.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/Kconfig: Update config option based on page size.

2016-02-18 Thread Rashmica



On 19/02/16 15:08, Balbir Singh wrote:


On 19/02/16 12:55, Rashmica Gupta wrote:

Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.

The error occurs because of the following constraint (from
include/linux/mmzone.h) being violated:

MAX_ORDER -1 + PAGESHIFT <= SECTION_SIZE_BITS.

IA64 has this cool hack

   12 #ifdef CONFIG_FORCE_MAX_ZONEORDER
   13 #if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > 
SECTION_SIZE_BITS)
   14 #undef SECTION_SIZE_BITS
   15 #define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + 
PAGE_SHIFT)
   16 #endif

But coming back (we can revisit the SECTION_SIZE_BITS definition later)
I feel like someone more senior than me should weigh in on if this is 
worth doing...



MAX_ORDER -1 + 16 <= 24 for 64 K

and

MAX_ORDER -1 + 12 < = 24 for 4K

Your calculations are correct

Expanding this out, we get:

FORCE_MAX_ZONEBITS <= 25 - PAGESHIFT,

which requires, for a 64K page, FORCE_MAX_ZONEBITS <= 9. Thus
set max value of FORCE_MAX_ZONEORDER for 64K pages to 9.

Also, check the minimum value:
In include/linux/huge_mm.h, we have the constraint HPAGE_PMD_ORDER <
MAX_ORDER which expands out to:

PTE_INDEX_SIZE < FORCE_MAX_ZONEORDER.
PTE_INDEX_SIZE is:
9 (4k hash or no hash 4K pgtable) or
8 (64K hash or no hash 64K pgtable).
Thus a min value of 9 for 64K pages is reasonable.

For 4K pages we end up with

9 < FORCE_MAX_ZONE_ORDER
FORCE_MAX_ZONE_ORDER -1 + 12 < = 24

The range is 9 to 13

For 64K we end up with
8 < FORCE_MAX_ZONE_ORDER
FORCE_MAX_ZONE_ORDER -1 + 16 <= 24 or FORCE_MAX_ZONE_ORDER <= 9

The range is really between 8 and 9 unless we tweak the SECTION_SIZE_BITS

Yup, you are right! Might have had a brain spaz...

So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 9-9.

Signed-off-by: Rashmica Gupta 
---
  arch/powerpc/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e4824fd04bb7..3bd3465b93ba 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -585,7 +585,7 @@ endchoice
  
  config FORCE_MAX_ZONEORDER

int "Maximum zone order"
-   range 9 64 if PPC64 && PPC_64K_PAGES
+   range 9 9 if PPC64 && PPC_64K_PAGES

range 8 9?

Agreed.

default "9" if PPC64 && PPC_64K_PAGES
range 13 64 if PPC64 && !PPC_64K_PAGES

Should this be fixed as well?
range 9 13?

Agreed.

default "13" if PPC64 && !PPC_64K_PAGES
Should the default values remain as is, or follow the trend and be equal 
to the minimum value?

Please check my calculations

Reviewed-by: Balbir Singh 

Balbir Singh


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/Kconfig: Update config option based on page size.

2016-02-18 Thread Balbir Singh


On 19/02/16 12:55, Rashmica Gupta wrote:
> Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
> FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.
>
> The error occurs because of the following constraint (from
> include/linux/mmzone.h) being violated:
>
>   MAX_ORDER -1 + PAGESHIFT <= SECTION_SIZE_BITS.
IA64 has this cool hack

  12 #ifdef CONFIG_FORCE_MAX_ZONEORDER
  13 #if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
  14 #undef SECTION_SIZE_BITS
  15 #define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT)
  16 #endif

But coming back (we can revisit the SECTION_SIZE_BITS definition later)

MAX_ORDER -1 + 16 <= 24 for 64 K

and

MAX_ORDER -1 + 12 < = 24 for 4K

Your calculations are correct
>
> Expanding this out, we get:
>
>   FORCE_MAX_ZONEBITS <= 25 - PAGESHIFT,
>
> which requires, for a 64K page, FORCE_MAX_ZONEBITS <= 9. Thus
> set max value of FORCE_MAX_ZONEORDER for 64K pages to 9.
>
> Also, check the minimum value:
> In include/linux/huge_mm.h, we have the constraint HPAGE_PMD_ORDER <
> MAX_ORDER which expands out to:
>
>   PTE_INDEX_SIZE < FORCE_MAX_ZONEORDER.

> PTE_INDEX_SIZE is:
>   9 (4k hash or no hash 4K pgtable) or
>   8 (64K hash or no hash 64K pgtable).
> Thus a min value of 9 for 64K pages is reasonable.

For 4K pages we end up with

9 < FORCE_MAX_ZONE_ORDER
FORCE_MAX_ZONE_ORDER -1 + 12 < = 24

The range is 9 to 13

For 64K we end up with
8 < FORCE_MAX_ZONE_ORDER
FORCE_MAX_ZONE_ORDER -1 + 16 <= 24 or FORCE_MAX_ZONE_ORDER <= 9

The range is really between 8 and 9 unless we tweak the SECTION_SIZE_BITS
> So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 9-9.
>
> Signed-off-by: Rashmica Gupta 
> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e4824fd04bb7..3bd3465b93ba 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -585,7 +585,7 @@ endchoice
>  
>  config FORCE_MAX_ZONEORDER
>   int "Maximum zone order"
> - range 9 64 if PPC64 && PPC_64K_PAGES
> + range 9 9 if PPC64 && PPC_64K_PAGES
range 8 9?
>   default "9" if PPC64 && PPC_64K_PAGES
>   range 13 64 if PPC64 && !PPC_64K_PAGES
Should this be fixed as well?
range 9 13?
>   default "13" if PPC64 && !PPC_64K_PAGES

Please check my calculations

Reviewed-by: Balbir Singh 

Balbir Singh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()

2016-02-18 Thread Gavin Shan
On Wed, Feb 17, 2016 at 08:59:53AM -0600, Rob Herring wrote:
>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan  wrote:
>> This renames unflatten_dt_node() to unflatten_dt_nodes() as it
>> populates multiple device nodes from FDT blob. No logical changes
>> introduced.
>>
>> Signed-off-by: Gavin Shan 
>> ---
>>  drivers/of/fdt.c | 14 +++---
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>
>Acked-by: Rob Herring 
>
>I'm happy to take patches 40-42 for 4.6 if the rest of the series
>doesn't go in given they fix a separate problem. I just need to know
>soon (or at least they need to go into -next soon).
>

Thanks for quick response, Rob. It depends how much comments I will
receive for the powerpc/powernv part. Except that, all parts including
this one have been ack'ed. I can discuss it with Michael Ellerman.
By the way, how soon you need the decision to merge 40-42? If that's
one or two weeks later, I don't think the reivew on the whole series
can be done.

Also, I think you probably can merge 40-44 as they're all about
fdt.c. If they can be merged at one time, I needn't bother (cc)
you again if I need send a updated revision. Thanks for your
review.

Thanks,
Gavin

>Rob
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 01/30] mm: Make vm_get_page_prot arch specific.

2016-02-18 Thread Aneesh Kumar K.V
Dave Hansen  writes:

> On 02/18/2016 08:50 AM, Aneesh Kumar K.V wrote:
>> With next generation power processor, we are having a new mmu model
>> [1] that require us to maintain a different linux page table format.
>> 
>> Inorder to support both current and future ppc64 systems with a single
>> kernel we need to make sure kernel can select between different page
>> table format at runtime. With the new MMU (radix MMU) added, we will
>> have to dynamically switch between different protection map. Hence
>> override vm_get_page_prot instead of using arch_vm_get_page_prot. We
>> also drop arch_vm_get_page_prot since only powerpc used it.
>
> Hi Aneesh,
>
> I've got some patches I'm hoping to get in to 4.6 that start using
> arch_vm_get_page_prot() on x86:
>
>> http://git.kernel.org/cgit/linux/kernel/git/daveh/x86-pkeys.git/commit/?h=pkeys-v024=aa1e61398fb598869981cfe48275cff832945669
>
> So I'd prefer that it stay in place. :)

Ok. I will update the patch to keep that.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 01/30] mm: Make vm_get_page_prot arch specific.

2016-02-18 Thread Aneesh Kumar K.V
Paul Mackerras  writes:

> On Thu, Feb 18, 2016 at 10:20:25PM +0530, Aneesh Kumar K.V wrote:
>> With next generation power processor, we are having a new mmu model
>> [1] that require us to maintain a different linux page table format.
>> 
>> Inorder to support both current and future ppc64 systems with a single
>> kernel we need to make sure kernel can select between different page
>> table format at runtime. With the new MMU (radix MMU) added, we will
>> have to dynamically switch between different protection map. Hence
>> override vm_get_page_prot instead of using arch_vm_get_page_prot. We
>> also drop arch_vm_get_page_prot since only powerpc used it.
>
> This seems like unnecessary churn to me.  Let's just make hash use the
> same values as radix for things like _PAGE_RW, _PAGE_EXEC etc., and
> then we don't need any of this.
>

I was hoping to do that after this series. Something similar to

https://github.com/kvaneesh/linux/commit/0c2ac1328b678a6e187d1f2644a007204c59a047

"
powerpc/mm: Add helper for page flag access in ioremap_at

Instead of using variables we use static inline which get patched during
boot to either hash or radix version.
"

That gives us a base to revert patches if we find issues with hash and
still have a working radix base. So idea is to introduce radix with minimal
changes to hash and then consolidate hash and radix as much as we can by
updating hash linux format.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 00/30] Book3s abstraction in preparation for new MMU model

2016-02-18 Thread Aneesh Kumar K.V
Paul Mackerras  writes:

> On Thu, Feb 18, 2016 at 10:20:24PM +0530, Aneesh Kumar K.V wrote:
>> Hello,
>> 
>> This is a large series, mostly consisting of code movement. No new features
>> are done in this series. The changes are done to accomodate the upcoming new 
>> memory
>> model in future powerpc chips. The details of the new MMU model can be found 
>> at
>> 
>>  http://ibm.biz/power-isa3 (Needs registration). I am including a summary of 
>> the changes below.
>
> This doesn't apply against Linus' current tree - have you already
> posted the prerequisite patches?  If so, what's the subject of the
> 0/N patch of the prerequisite series?


I would suggest to use github to get the tree. Yes I have some dependent
patches and they are not in a single series.

https://github.com/kvaneesh/linux/commits/radix-mmu-v2

Most of the dependent pathces are already in mpe/fixes and the reason to put
them in the branch is to avoid patch apply issues, if we are planning to
take this for next merge window. Since this series involve lots of code
movement, I was worried about errors during cherry-pick/conflict resolution. 


-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/Kconfig: Update config option based on page size.

2016-02-18 Thread Rashmica Gupta
Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.

The error occurs because of the following constraint (from
include/linux/mmzone.h) being violated:

MAX_ORDER -1 + PAGESHIFT <= SECTION_SIZE_BITS.

Expanding this out, we get:

FORCE_MAX_ZONEBITS <= 25 - PAGESHIFT,

which requires, for a 64K page, FORCE_MAX_ZONEBITS <= 9. Thus
set max value of FORCE_MAX_ZONEORDER for 64K pages to 9.

Also, check the minimum value:
In include/linux/huge_mm.h, we have the constraint HPAGE_PMD_ORDER <
MAX_ORDER which expands out to:

PTE_INDEX_SIZE < FORCE_MAX_ZONEORDER.

PTE_INDEX_SIZE is:
9 (4k hash or no hash 4K pgtable) or
8 (64K hash or no hash 64K pgtable).
Thus a min value of 9 for 64K pages is reasonable.

So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 9-9.

Signed-off-by: Rashmica Gupta 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e4824fd04bb7..3bd3465b93ba 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -585,7 +585,7 @@ endchoice
 
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
-   range 9 64 if PPC64 && PPC_64K_PAGES
+   range 9 9 if PPC64 && PPC_64K_PAGES
default "9" if PPC64 && PPC_64K_PAGES
range 13 64 if PPC64 && !PPC_64K_PAGES
default "13" if PPC64 && !PPC_64K_PAGES
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 01/30] mm: Make vm_get_page_prot arch specific.

2016-02-18 Thread Dave Hansen
On 02/18/2016 08:50 AM, Aneesh Kumar K.V wrote:
> With next generation power processor, we are having a new mmu model
> [1] that require us to maintain a different linux page table format.
> 
> Inorder to support both current and future ppc64 systems with a single
> kernel we need to make sure kernel can select between different page
> table format at runtime. With the new MMU (radix MMU) added, we will
> have to dynamically switch between different protection map. Hence
> override vm_get_page_prot instead of using arch_vm_get_page_prot. We
> also drop arch_vm_get_page_prot since only powerpc used it.

Hi Aneesh,

I've got some patches I'm hoping to get in to 4.6 that start using
arch_vm_get_page_prot() on x86:

> http://git.kernel.org/cgit/linux/kernel/git/daveh/x86-pkeys.git/commit/?h=pkeys-v024=aa1e61398fb598869981cfe48275cff832945669

So I'd prefer that it stay in place. :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/3] powerpc: Add POWER9 cputable entry

2016-02-18 Thread Michael Neuling
Add a cputable entry for POWER9.  More code is required to actually
boot and run on a POWER9 but this gets the base piece in which we can
start building on.

Copies over from POWER8 except for:
- Adds a new CPU_FTR_ARCH_300 bit to start hanging new architecture
   features from (in subsequent patches).
- Advertises new user features bits PPC_FEATURE2_ARCH_3_00 &
  HAS_IEEE128 when on POWER9.
- Drops CPU_FTR_SUBCORE.
- Drops PMU code and machine check.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/cputable.h   | 17 +++---
 arch/powerpc/include/asm/mmu-hash64.h |  1 +
 arch/powerpc/include/asm/mmu.h|  1 +
 arch/powerpc/kernel/cpu_setup_power.S | 44 +++
 arch/powerpc/kernel/cputable.c| 27 +
 arch/powerpc/kernel/mce_power.c   | 17 +++---
 6 files changed, 95 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index a47e175..94ace9b 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -171,7 +171,7 @@ enum {
 #define CPU_FTR_ARCH_201   LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_206   LONG_ASM_CONST(0x0004)
 #define CPU_FTR_ARCH_207S  LONG_ASM_CONST(0x0008)
-/* Free
LONG_ASM_CONST(0x0010) */
+#define CPU_FTR_ARCH_300   LONG_ASM_CONST(0x0010)
 #define CPU_FTR_MMCRA  LONG_ASM_CONST(0x0020)
 #define CPU_FTR_CTRL   LONG_ASM_CONST(0x0040)
 #define CPU_FTR_SMTLONG_ASM_CONST(0x0080)
@@ -447,6 +447,16 @@ enum {
CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_SUBCORE)
 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
 #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
+#define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
+   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
+   CPU_FTR_MMCRA | CPU_FTR_SMT | \
+   CPU_FTR_COHERENT_ICACHE | \
+   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
+   CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
+   CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
+   CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
+   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -465,7 +475,7 @@ enum {
(CPU_FTRS_POWER4 | CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
-CPU_FTRS_PA6T | CPU_FTR_VSX)
+CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9)
 #endif
 #else
 enum {
@@ -516,7 +526,8 @@ enum {
(CPU_FTRS_POWER4 & CPU_FTRS_PPC970 & CPU_FTRS_POWER5 & \
 CPU_FTRS_POWER6 & CPU_FTRS_POWER7 & CPU_FTRS_CELL & \
 CPU_FTRS_PA6T & CPU_FTRS_POWER8 & CPU_FTRS_POWER8E & \
-CPU_FTRS_POWER8_DD1 & ~CPU_FTR_HVMODE & CPU_FTRS_POSSIBLE)
+CPU_FTRS_POWER8_DD1 & ~CPU_FTR_HVMODE & CPU_FTRS_POSSIBLE & \
+CPU_FTRS_POWER9)
 #endif
 #else
 enum {
diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/mmu-hash64.h
index 7352d3f..e36dc90 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -114,6 +114,7 @@
 
 #define POWER7_TLB_SETS128 /* # sets in POWER7 TLB */
 #define POWER8_TLB_SETS512 /* # sets in POWER8 TLB */
+#define POWER9_TLB_SETS_HASH   256 /* # sets in POWER9 TLB Hash mode */
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 3d5abfe..54d4650 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -97,6 +97,7 @@
 #define MMU_FTRS_POWER6MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
 #define MMU_FTRS_POWER7MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
 #define MMU_FTRS_POWER8MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
+#define MMU_FTRS_POWER9MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
 #define MMU_FTRS_CELL  MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
MMU_FTR_CI_LARGE_PAGE
 #define MMU_FTRS_PA6T  MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index cb3e272..5932219 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -84,6 +84,39 @@ 

[PATCH v2 2/3] powerpc: Use defines for __init_tlb_power[78]

2016-02-18 Thread Michael Neuling
Use defines for literals __init_tlb_power[78] rather than hand coding
them.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/cpu_setup_power.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index 9c9b741..cb3e272 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Entry: r3 = crap, r4 = ptr to cputable entry
  *
@@ -139,7 +140,7 @@ __init_HFSCR:
  * (invalidate by congruence class). P7 has 128 CCs., P8 has 512.
  */
 __init_tlb_power7:
-   li  r6,128
+   li  r6,POWER7_TLB_SETS
mtctr   r6
li  r7,0xc00/* IS field = 0b11 */
ptesync
@@ -150,7 +151,7 @@ __init_tlb_power7:
 1: blr
 
 __init_tlb_power8:
-   li  r6,512
+   li  r6,POWER8_TLB_SETS
mtctr   r6
li  r7,0xc00/* IS field = 0b11 */
ptesync
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/3] powerpc/powernv: Create separate subcores CPU feature bit

2016-02-18 Thread Michael Neuling
Subcores isn't really part of the 2.07 architecture but currently we
turn it on using the 2.07 feature bit.  Subcores is really a POWER8
specific feature.

This adds a new CPU_FTR bit just for subcores and moves the subcore
init code over to use this.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/cputable.h  | 3 ++-
 arch/powerpc/platforms/powernv/subcore.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index b118072..a47e175 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -196,6 +196,7 @@ enum {
 #define CPU_FTR_DAWR   LONG_ASM_CONST(0x0400)
 #define CPU_FTR_DABRX  LONG_ASM_CONST(0x0800)
 #define CPU_FTR_PMAO_BUG   LONG_ASM_CONST(0x1000)
+#define CPU_FTR_SUBCORE
LONG_ASM_CONST(0x2000)
 
 #ifndef __ASSEMBLY__
 
@@ -443,7 +444,7 @@ enum {
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
-   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
+   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_SUBCORE)
 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
 #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
diff --git a/arch/powerpc/platforms/powernv/subcore.c 
b/arch/powerpc/platforms/powernv/subcore.c
index 503a73f..0babef1 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -407,7 +407,7 @@ static DEVICE_ATTR(subcores_per_core, 0644,
 
 static int subcore_init(void)
 {
-   if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+   if (!cpu_has_feature(CPU_FTR_SUBCORE))
return 0;
 
/*
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/3] powerpc: Add POWER9 cputable entry

2016-02-18 Thread Michael Neuling
Add CPU table entry for POWER9

v2:
 Updates based on comments from mpe:
  - reuse user features from POWER8
  - remove "Hacked up" from comment
  - gave oprofile name POWER9
  - removed untested power8 machine check hook
  - used defines for tlb init code
  - removed pmu init from setup code
  - added POWER9 to CPU_FTRS_ALWAYS
  - reworded comment on common flush_tlb_* code
  - moved to CPU_FTR_ARCH_300 to be consistent with user bit
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0

2016-02-18 Thread Scott Wood
On Thu, 2016-02-18 at 15:18 -0600, Leo Li wrote:
> On Wed, Feb 17, 2016 at 5:24 AM, Raghav Dogra  wrote:
> > The new IFC controller version 2.0 has a different memory map page.
> > Upto IFC 1.4 PAGE size is 4 KB and from IFC2.0 PAGE size is 64KB.
> > This patch segregates the IFC global and runtime registers to appropriate
> > PAGE sizes.
> > 
> > Signed-off-by: Jaiprakash Singh 
> > Signed-off-by: Raghav Dogra 
> > Acked-by: Li Yang 
> > Signed-off-by: Raghav Dogra 
> > ---
> > Changes for v3: not dependent on
> > "drivers/memory: Add deep sleep support for IFC" patch
> > 
> > Changes for v2: rebased to resolve conflicts
> > Applicable to git://git.infradead.org/l2-mtd.git
> > 
> > This patch is dependent on "drivers/memory: Add deep sleep support for
> > IFC"
> > https://patchwork.ozlabs.org/patch/582762/
> > which is also applicable to git://git.infradead.org/l2-mtd.git
> 
> This patch seems to be in good shape, but the dependency is still
> having quite some feedback to be addressed.  Depending on it will
> greatly delay the time that this critical patch for LS2 to be merged.
> Could you remove the dependency like Scott already suggested?

According to the changelog the dependency has been removed (it would have been
clearer to remove this comment as well...).

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 01/30] mm: Make vm_get_page_prot arch specific.

2016-02-18 Thread Paul Mackerras
On Thu, Feb 18, 2016 at 10:20:25PM +0530, Aneesh Kumar K.V wrote:
> With next generation power processor, we are having a new mmu model
> [1] that require us to maintain a different linux page table format.
> 
> Inorder to support both current and future ppc64 systems with a single
> kernel we need to make sure kernel can select between different page
> table format at runtime. With the new MMU (radix MMU) added, we will
> have to dynamically switch between different protection map. Hence
> override vm_get_page_prot instead of using arch_vm_get_page_prot. We
> also drop arch_vm_get_page_prot since only powerpc used it.

This seems like unnecessary churn to me.  Let's just make hash use the
same values as radix for things like _PAGE_RW, _PAGE_EXEC etc., and
then we don't need any of this.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 00/30] Book3s abstraction in preparation for new MMU model

2016-02-18 Thread Paul Mackerras
On Thu, Feb 18, 2016 at 10:20:24PM +0530, Aneesh Kumar K.V wrote:
> Hello,
> 
> This is a large series, mostly consisting of code movement. No new features
> are done in this series. The changes are done to accomodate the upcoming new 
> memory
> model in future powerpc chips. The details of the new MMU model can be found 
> at
> 
>  http://ibm.biz/power-isa3 (Needs registration). I am including a summary of 
> the changes below.

This doesn't apply against Linus' current tree - have you already
posted the prerequisite patches?  If so, what's the subject of the
0/N patch of the prerequisite series?

> ISA 3.0 adds support for the radix tree style of MMU with full
> virtualization and related control mechanisms that manage its
> coexistence with the HPT. Radix-using operating systems will
> manage their own translation tables instead of relying on hcalls.
> 
> Radix style MMU model requires us to do a 4 level page table
> with 64K and 4K page size. The table index size different page size
> is listed below
> 
> PGD -> 13 bits
> PUD -> 9 (1G hugepage)
> PMD -> 9 (2M huge page)
> PTE -> 5 (for 64k), 9 (for 4k)
> 
> We also require the page table to be in big endian format.
> 
> The changes proposed in this series enables us to support both
> hash page table and radix tree style MMU using a single kernel
> with limited impact. The idea is to change core page table
> accessors to static inline functions and later hotpatch them
> to switch to hash or radix tree functions. For ex:
> 
> static inline int pte_write(pte_t pte)
> {
>if (radix_enabled())
>return rpte_write(pte);
> return hlpte_write(pte);
> }

Given that with a hash-based MMU, the Linux page tables are purely a
software construct, I don't see why this complexity is necessary.  We
can make the PTE have the same format on radix and hash instead.  I
have a patch series that does that almost ready to post.

> On boot we will hotpatch the code so as to avoid conditional operation.
> 
> The other two major change propsed in this series is to switch hash
> linux page table to a 4 level table in big endian format. This is
> done so that functions like pte_val(), pud_populate() doesn't need
> hotpatching and thereby helps in limiting runtime impact of the changes.

Right, I agree with this.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v3] mtd/ifc: Add support for IFC controller version 2.0

2016-02-18 Thread Leo Li
On Wed, Feb 17, 2016 at 5:24 AM, Raghav Dogra  wrote:
> The new IFC controller version 2.0 has a different memory map page.
> Upto IFC 1.4 PAGE size is 4 KB and from IFC2.0 PAGE size is 64KB.
> This patch segregates the IFC global and runtime registers to appropriate
> PAGE sizes.
>
> Signed-off-by: Jaiprakash Singh 
> Signed-off-by: Raghav Dogra 
> Acked-by: Li Yang 
> Signed-off-by: Raghav Dogra 
> ---
> Changes for v3: not dependent on
> "drivers/memory: Add deep sleep support for IFC" patch
>
> Changes for v2: rebased to resolve conflicts
> Applicable to git://git.infradead.org/l2-mtd.git
>
> This patch is dependent on "drivers/memory: Add deep sleep support for IFC"
> https://patchwork.ozlabs.org/patch/582762/
> which is also applicable to git://git.infradead.org/l2-mtd.git

This patch seems to be in good shape, but the dependency is still
having quite some feedback to be addressed.  Depending on it will
greatly delay the time that this critical patch for LS2 to be merged.
Could you remove the dependency like Scott already suggested?

Regards,
Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-18 Thread Kirill A. Shutemov
On Thu, Feb 18, 2016 at 04:00:37PM +0100, Gerald Schaefer wrote:
> On Thu, 18 Feb 2016 01:58:08 +0200
> "Kirill A. Shutemov"  wrote:
> 
> > On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > > Sebastian Ott  wrote:
> > > 
> > > > [   59.875935] [ cut here ]
> > > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP 
> > > > DEBUG_PAGEALLOC
> > > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en 
> > > > vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad 
> > > > ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 
> > > > des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common 
> > > > genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap 
> > > > eadm_sch macvlan kvm autofs4
> > > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: GW   
> > > > 4.4.0-07794-ga4eff16-dirty #77
> > > > [   59.876036] task: d2312948 ti: cfecc000 task.ti: 
> > > > cfecc000
> > > > [   59.876039] Krnl PSW : 0704d0018000 002bf3aa 
> > > > (__split_huge_pmd_locked+0x562/0xa10)
> > > > [   59.876045]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 
> > > > PM:0 EA:3
> > > >Krnl GPRS: 01a7a1cf 03d10177c000 
> > > > 00044068 5df00215
> > > > [   59.876051]0001 0001 
> > > >  774e6900
> > > > [   59.876054]03ff5200 6d403b10 
> > > > 6e1eb800 03ff51f0
> > > > [   59.876058]03d10177c000 00715190 
> > > > 002bf234 cfecfb58
> > > > [   59.876068] Krnl Code: 002bf39c: d507d010a000clc 
> > > > 16(8,%%r13),0(%%r10)
> > > >   002bf3a2: a7840004brc 
> > > > 8,2bf3aa
> > > >  #002bf3a6: a7f40001brc 
> > > > 15,2bf3a8
> > > >  >002bf3aa: 91407440tm  
> > > > 1088(%%r7),64
> > > >   002bf3ae: a7840208brc 
> > > > 8,2bf7be
> > > >   002bf3b2: a7f401e9brc 
> > > > 15,2bf784
> > > >   002bf3b6: 9104a006tm  
> > > > 6(%%r10),4
> > > >   002bf3ba: a7740004brc 
> > > > 7,2bf3c2
> > > > [   59.876089] Call Trace:
> > > > [   59.876092] ([<002bf234>] 
> > > > __split_huge_pmd_locked+0x3ec/0xa10)
> > > > [   59.876095]  [<002c4310>] __split_huge_pmd+0x118/0x218
> > > > [   59.876099]  [<002810e8>] unmap_single_vma+0x2d8/0xb40
> > > > [   59.876102]  [<00282d66>] zap_page_range+0x116/0x318
> > > > [   59.876105]  [<0029b834>] SyS_madvise+0x23c/0x5e8
> > > > [   59.876108]  [<006f9f56>] system_call+0xd6/0x258
> > > > [   59.876111]  [<03ff9bbfd282>] 0x3ff9bbfd282
> > > > [   59.876113] INFO: lockdep is turned off.
> > > > [   59.876115] Last Breaking-Event-Address:
> > > > [   59.876118]  [<002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > > 
> > > The BUG at mm/huge_memory.c:2884 is interesting, it's the 
> > > BUG_ON(!pte_none(*pte))
> > > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > > pagetables to be empty, but in collapse_huge_page() we deposit the 
> > > original
> > > pagetable instead of allocating a new (empty) one. This saves an 
> > > allocation,
> > > which is good, but doesn't that mean that if such a collapsed hugepage 
> > > will
> > > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > > 
> > > This behavior is not new, it was the same before the THP rework, so I do 
> > > not
> > > assume that it is related to the current problems, maybe with the 
> > > exception
> > > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 
> > > myself,
> > > and the other crashes probably cannot be explained with this. Maybe I am
> > > also missing something, but I do not see how collapse_huge_page() and the
> > > (non-empty) pgtable deposit there can work out with the 
> > > BUG_ON(!pte_none(*pte))
> > > checks. Any thoughts?
> > 
> > I don't think there's a problem: ptes in the pgtable are cleared with
> > pte_clear() in __collapse_huge_page_copy().
> > 
> 
> Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
> wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
> list, or both. Given that on s390 we keep the listheads for the 
> deposit/withdraw
> list inside the pre-allocated pgtables, instead of the struct pages, it may
> also 

[PATCH V3 29/30] powerpc/mm: Hash linux abstraction for tlbflush routines

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 28 ++-
 arch/powerpc/include/asm/book3s/64/tlbflush.h  | 56 ++
 arch/powerpc/include/asm/tlbflush.h|  2 +-
 arch/powerpc/mm/tlb_hash64.c   |  2 +-
 4 files changed, 73 insertions(+), 15 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/tlbflush.h

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
index 1b753f96b374..ddce8477fe0c 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -52,40 +52,42 @@ extern void flush_hash_range(unsigned long number, int 
local);
 extern void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
pmd_t *pmdp, unsigned int psize, int ssize,
unsigned long flags);
-
-static inline void local_flush_tlb_mm(struct mm_struct *mm)
+static inline void local_flush_hltlb_mm(struct mm_struct *mm)
 {
 }
 
-static inline void flush_tlb_mm(struct mm_struct *mm)
+static inline void flush_hltlb_mm(struct mm_struct *mm)
 {
 }
 
-static inline void local_flush_tlb_page(struct vm_area_struct *vma,
-   unsigned long vmaddr)
+static inline void local_flush_hltlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
 {
 }
 
-static inline void flush_tlb_page(struct vm_area_struct *vma,
- unsigned long vmaddr)
+static inline void flush_hltlb_page(struct vm_area_struct *vma,
+   unsigned long vmaddr)
 {
 }
 
-static inline void flush_tlb_page_nohash(struct vm_area_struct *vma,
-unsigned long vmaddr)
+static inline void flush_hltlb_page_nohash(struct vm_area_struct *vma,
+  unsigned long vmaddr)
 {
 }
 
-static inline void flush_tlb_range(struct vm_area_struct *vma,
-  unsigned long start, unsigned long end)
+static inline void flush_hltlb_range(struct vm_area_struct *vma,
+unsigned long start, unsigned long end)
 {
 }
 
-static inline void flush_tlb_kernel_range(unsigned long start,
- unsigned long end)
+static inline void flush_hltlb_kernel_range(unsigned long start,
+   unsigned long end)
 {
 }
 
+
+struct mmu_gather;
+extern void hltlb_flush(struct mmu_gather *tlb);
 /* Private function for use by PCI IO mapping code */
 extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
 unsigned long end);
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
new file mode 100644
index ..37d7f289ad42
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -0,0 +1,56 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_TLBFLUSH_H
+#define _ASM_POWERPC_BOOK3S_64_TLBFLUSH_H
+
+#include 
+
+static inline void flush_tlb_range(struct vm_area_struct *vma,
+  unsigned long start, unsigned long end)
+{
+   return flush_hltlb_range(vma, start, end);
+}
+
+static inline void flush_tlb_kernel_range(unsigned long start,
+ unsigned long end)
+{
+   return flush_hltlb_kernel_range(start, end);
+}
+
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+   return local_flush_hltlb_mm(mm);
+}
+
+static inline void local_flush_tlb_page(struct vm_area_struct *vma,
+   unsigned long vmaddr)
+{
+   return local_flush_hltlb_page(vma, vmaddr);
+}
+
+static inline void flush_tlb_page_nohash(struct vm_area_struct *vma,
+unsigned long vmaddr)
+{
+   return flush_hltlb_page_nohash(vma, vmaddr);
+}
+
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+   return hltlb_flush(tlb);
+}
+
+#ifdef CONFIG_SMP
+static inline void flush_tlb_mm(struct mm_struct *mm)
+{
+   return flush_hltlb_mm(mm);
+}
+
+static inline void flush_tlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+   return flush_hltlb_page(vma, vmaddr);
+}
+#else
+#define flush_tlb_mm(mm)   local_flush_tlb_mm(mm)
+#define flush_tlb_page(vma, addr)  local_flush_tlb_page(vma, addr)
+#endif /* CONFIG_SMP */
+
+#endif /*  _ASM_POWERPC_BOOK3S_64_TLBFLUSH_H */
diff --git a/arch/powerpc/include/asm/tlbflush.h 
b/arch/powerpc/include/asm/tlbflush.h
index 9f77f85e3e99..2fc4331c5bc5 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -78,7 +78,7 @@ static inline void local_flush_tlb_mm(struct 

[PATCH V3 30/30] powerpc/mm: Hash linux abstraction for pte swap encoding

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 35 +++--
 arch/powerpc/include/asm/book3s/64/pgtable.h | 57 
 arch/powerpc/mm/slb.c|  1 -
 3 files changed, 70 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index c9403f94c9fc..03c87166b3b8 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -234,34 +234,25 @@
 #define hlpmd_index(address) (((address) >> (H_PMD_SHIFT)) & (H_PTRS_PER_PMD - 
1))
 #define hlpte_index(address) (((address) >> (PAGE_SHIFT)) & (H_PTRS_PER_PTE - 
1))
 
-/* Encode and de-code a swap entry */
-#define MAX_SWAPFILES_CHECK() do { \
-   BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
-   /*  \
-* Don't have overlapping bits with _PAGE_HPTEFLAGS \
-* We filter HPTEFLAGS on set_pte.  \
-*/ \
-   BUILD_BUG_ON(H_PAGE_HPTEFLAGS & (0x1f << H_PAGE_BIT_SWAP_TYPE)); \
-   BUILD_BUG_ON(H_PAGE_HPTEFLAGS & H_PAGE_SWP_SOFT_DIRTY); \
-   } while (0)
 /*
  * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
+ * We encode swap type in the lower part of pte, skipping the lowest two bits.
+ * Offset is encoded as pfn.
  */
-#define SWP_TYPE_BITS 5
-#define __swp_type(x)  (((x).val >> H_PAGE_BIT_SWAP_TYPE) \
-   & ((1UL << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x)((x).val >> H_PTE_RPN_SHIFT)
-#define __swp_entry(type, offset)  ((swp_entry_t) { \
-   ((type) << H_PAGE_BIT_SWAP_TYPE) \
-   | ((offset) << H_PTE_RPN_SHIFT) })
+#define hl_swp_type(x) (((x).val >> H_PAGE_BIT_SWAP_TYPE)  \
+& ((1UL << SWP_TYPE_BITS) - 1))
+#define hl_swp_offset(x)   ((x).val >> H_PTE_RPN_SHIFT)
+#define hl_swp_entry(type, offset) ((swp_entry_t) {\
+   ((type) << H_PAGE_BIT_SWAP_TYPE)\
+   | ((offset) << H_PTE_RPN_SHIFT) })
 /*
  * swp_entry_t must be independent of pte bits. We build a swp_entry_t from
  * swap type and offset we get from swap and convert that to pte to find a
  * matching pte in linux page table.
  * Clear bits not found in swap entries here.
  */
-#define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val((pte)) & 
~H_PAGE_PTE })
-#define __swp_entry_to_pte(x)  __pte((x).val | H_PAGE_PTE)
+#define hl_pte_to_swp_entry(pte)   ((swp_entry_t) { pte_val((pte)) & 
~H_PAGE_PTE })
+#define hl_swp_entry_to_pte(x) __pte((x).val | H_PAGE_PTE)
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define H_PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + H_PAGE_BIT_SWAP_TYPE))
@@ -270,17 +261,17 @@
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
-static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
+static inline pte_t hl_pte_swp_mksoft_dirty(pte_t pte)
 {
return __pte(pte_val(pte) | H_PAGE_SWP_SOFT_DIRTY);
 }
 
-static inline bool pte_swp_soft_dirty(pte_t pte)
+static inline bool hl_pte_swp_soft_dirty(pte_t pte)
 {
return !!(pte_val(pte) & H_PAGE_SWP_SOFT_DIRTY);
 }
 
-static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
+static inline pte_t hl_pte_swp_clear_soft_dirty(pte_t pte)
 {
return __pte(pte_val(pte) & ~H_PAGE_SWP_SOFT_DIRTY);
 }
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 43f393616a5d..446c85192cd4 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -5,6 +5,7 @@
  * the ppc64 hashed page table.
  */
 
+#define SWP_TYPE_BITS 5
 #include 
 #include 
 
@@ -325,6 +326,62 @@ static inline void set_pte_at(struct mm_struct *mm, 
unsigned long addr,
 {
return set_hlpte_at(mm, addr, ptep, pte);
 }
+/*
+ * Swap definitions
+ */
+
+/* Encode and de-code a swap entry */
+#define MAX_SWAPFILES_CHECK() do { \
+   BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS);  \
+   /*  \
+* Don't have overlapping bits with _PAGE_HPTEFLAGS \
+* We filter HPTEFLAGS on set_pte.  \
+*/ \
+   BUILD_BUG_ON(H_PAGE_HPTEFLAGS & (0x1f << 
H_PAGE_BIT_SWAP_TYPE)); \
+   BUILD_BUG_ON(H_PAGE_HPTEFLAGS & H_PAGE_SWP_SOFT_DIRTY); \
+   } while (0)
+/*
+ * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
+ */
+static inline swp_entry_t __pte_to_swp_entry(pte_t pte)
+{
+   return 

[PATCH V3 28/30] powerpc/mm: Hash linux abstraction for page table allocator

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 .../include/asm/book3s/64/pgalloc-hash-4k.h|  26 ++---
 .../include/asm/book3s/64/pgalloc-hash-64k.h   |  23 ++--
 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h  |  36 +--
 arch/powerpc/include/asm/book3s/64/pgalloc.h   | 118 +
 4 files changed, 148 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
index d1d67e585ad4..ae6480e2111b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
@@ -1,30 +1,30 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H
 #define _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H
 
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
+static inline void hlpmd_populate(struct mm_struct *mm, pmd_t *pmd,
pgtable_t pte_page)
 {
pmd_set(pmd, (unsigned long)page_address(pte_page));
 }
 
-static inline pgtable_t pmd_pgtable(pmd_t pmd)
+static inline pgtable_t hlpmd_pgtable(pmd_t pmd)
 {
return pmd_page(pmd);
 }
 
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
- unsigned long address)
+static inline pte_t *hlpte_alloc_one_kernel(struct mm_struct *mm,
+   unsigned long address)
 {
return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
 }
 
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
- unsigned long address)
+static inline pgtable_t hlpte_alloc_one(struct mm_struct *mm,
+   unsigned long address)
 {
struct page *page;
pte_t *pte;
 
-   pte = pte_alloc_one_kernel(mm, address);
+   pte = hlpte_alloc_one_kernel(mm, address);
if (!pte)
return NULL;
page = virt_to_page(pte);
@@ -35,12 +35,12 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
return page;
 }
 
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+static inline void hlpte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
free_page((unsigned long)pte);
 }
 
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
+static inline void hlpte_free(struct mm_struct *mm, pgtable_t ptepage)
 {
pgtable_page_dtor(ptepage);
__free_page(ptepage);
@@ -58,7 +58,7 @@ static inline void pgtable_free(void *table, unsigned 
index_size)
 
 #ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
+ void *table, int shift)
 {
unsigned long pgf = (unsigned long)table;
BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
@@ -75,14 +75,14 @@ static inline void __tlb_remove_table(void *_table)
 }
 #else /* !CONFIG_SMP */
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
+ void *table, int shift)
 {
pgtable_free(table, shift);
 }
 #endif /* CONFIG_SMP */
 
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
- unsigned long address)
+static inline void __hlpte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
+   unsigned long address)
 {
tlb_flush_pgtable(tlb, address);
pgtable_page_dtor(table);
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
index e2dab4f64316..cb382773397f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
@@ -4,45 +4,42 @@
 extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
 extern void page_table_free(struct mm_struct *, unsigned long *, int);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
-#ifdef CONFIG_SMP
-extern void __tlb_remove_table(void *_table);
-#endif
 
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
-   pgtable_t pte_page)
+static inline void hlpmd_populate(struct mm_struct *mm, pmd_t *pmd,
+ pgtable_t pte_page)
 {
pmd_set(pmd, (unsigned long)pte_page);
 }
 
-static inline pgtable_t pmd_pgtable(pmd_t pmd)
+static inline pgtable_t hlpmd_pgtable(pmd_t pmd)
 {
return (pgtable_t)pmd_page_vaddr(pmd);
 }
 
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
- unsigned long address)
+static inline pte_t *hlpte_alloc_one_kernel(struct mm_struct *mm,
+   unsigned long address)
 {
return (pte_t *)page_table_alloc(mm, address, 1);
 

[PATCH V3 27/30] powerpc/mm: Hash linux abstraction for HugeTLB

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 10 
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 14 +--
 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h |  7 ++
 arch/powerpc/include/asm/book3s/64/pgalloc.h  |  9 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 30 +++
 arch/powerpc/include/asm/hugetlb.h|  4 ---
 arch/powerpc/include/asm/nohash/pgalloc.h |  7 ++
 arch/powerpc/mm/hugetlbpage-hash64.c  | 11 -
 arch/powerpc/mm/hugetlbpage.c | 16 
 9 files changed, 86 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 1ef4b39f96fd..5fc9e4e1db5f 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -66,23 +66,23 @@
 /*
  * For 4k page size, we support explicit hugepage via hugepd
  */
-static inline int pmd_huge(pmd_t pmd)
+static inline int hlpmd_huge(pmd_t pmd)
 {
return 0;
 }
 
-static inline int pud_huge(pud_t pud)
+static inline int hlpud_huge(pud_t pud)
 {
return 0;
 }
 
-static inline int pgd_huge(pgd_t pgd)
+static inline int hlpgd_huge(pgd_t pgd)
 {
return 0;
 }
 #define pgd_huge pgd_huge
 
-static inline int hugepd_ok(hugepd_t hpd)
+static inline int hlhugepd_ok(hugepd_t hpd)
 {
/*
 * if it is not a pte and have hugepd shift mask
@@ -93,7 +93,7 @@ static inline int hugepd_ok(hugepd_t hpd)
return true;
return false;
 }
-#define is_hugepd(hpd) (hugepd_ok(hpd))
+#define is_hlhugepd(hpd)   (hlhugepd_ok(hpd))
 #endif
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index e697fc528c0a..4fff8b12ba0f 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -146,7 +146,7 @@ extern bool __rpte_sub_valid(real_pte_t rpte, unsigned long 
index);
  * Defined in such a way that we can optimize away code block at build time
  * if CONFIG_HUGETLB_PAGE=n.
  */
-static inline int pmd_huge(pmd_t pmd)
+static inline int hlpmd_huge(pmd_t pmd)
 {
/*
 * leaf pte for huge page
@@ -154,7 +154,7 @@ static inline int pmd_huge(pmd_t pmd)
return !!(pmd_val(pmd) & H_PAGE_PTE);
 }
 
-static inline int pud_huge(pud_t pud)
+static inline int hlpud_huge(pud_t pud)
 {
/*
 * leaf pte for huge page
@@ -162,7 +162,7 @@ static inline int pud_huge(pud_t pud)
return !!(pud_val(pud) & H_PAGE_PTE);
 }
 
-static inline int pgd_huge(pgd_t pgd)
+static inline int hlpgd_huge(pgd_t pgd)
 {
/*
 * leaf pte for huge page
@@ -172,19 +172,19 @@ static inline int pgd_huge(pgd_t pgd)
 #define pgd_huge pgd_huge
 
 #ifdef CONFIG_DEBUG_VM
-extern int hugepd_ok(hugepd_t hpd);
-#define is_hugepd(hpd)   (hugepd_ok(hpd))
+extern int hlhugepd_ok(hugepd_t hpd);
+#define is_hlhugepd(hpd)   (hlhugepd_ok(hpd))
 #else
 /*
  * With 64k page size, we have hugepage ptes in the pgd and pmd entries. We 
don't
  * need to setup hugepage directory for them. Our pte and page directory format
  * enable us to have this enabled.
  */
-static inline int hugepd_ok(hugepd_t hpd)
+static inline int hlhugepd_ok(hugepd_t hpd)
 {
return 0;
 }
-#define is_hugepd(pdep)0
+#define is_hlhugepd(pdep)  0
 #endif /* CONFIG_DEBUG_VM */
 
 #endif /* CONFIG_HUGETLB_PAGE */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc-hash.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc-hash.h
index dbf680970c12..1dcfe7b75f06 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc-hash.h
@@ -56,4 +56,11 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
 {
pgtable_free_tlb(tlb, pud, H_PUD_INDEX_SIZE);
 }
+
+extern pte_t *huge_hlpte_alloc(struct mm_struct *mm, unsigned long addr,
+  unsigned long sz);
+extern void hugetlb_free_hlpgd_range(struct mmu_gather *tlb, unsigned long 
addr,
+unsigned long end, unsigned long floor,
+unsigned long ceiling);
+
 #endif /* _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index ff3c0e36fe3d..fa2ddda14b3d 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -66,4 +66,13 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, 
pmd_t *pmd,
 #include 
 #endif
 
+#ifdef CONFIG_HUGETLB_PAGE
+static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned 
long addr,
+

[PATCH V3 26/30] powerpc/mm: Hash linux abstraction for THP

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  42 ---
 arch/powerpc/include/asm/book3s/64/hash.h |  16 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 161 +-
 arch/powerpc/mm/pgtable-hash64.c  |  64 +-
 4 files changed, 208 insertions(+), 75 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 8008c9a89416..e697fc528c0a 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -190,11 +190,19 @@ static inline int hugepd_ok(hugepd_t hpd)
 #endif /* CONFIG_HUGETLB_PAGE */
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-extern unsigned long pmd_hugepage_update(struct mm_struct *mm,
-unsigned long addr,
-pmd_t *pmdp,
-unsigned long clr,
-unsigned long set);
+
+extern pmd_t pfn_hlpmd(unsigned long pfn, pgprot_t pgprot);
+extern pmd_t mk_hlpmd(struct page *page, pgprot_t pgprot);
+extern pmd_t hlpmd_modify(pmd_t pmd, pgprot_t newprot);
+extern int hl_has_transparent_hugepage(void);
+extern void set_hlpmd_at(struct mm_struct *mm, unsigned long addr,
+pmd_t *pmdp, pmd_t pmd);
+
+extern unsigned long hlpmd_hugepage_update(struct mm_struct *mm,
+  unsigned long addr,
+  pmd_t *pmdp,
+  unsigned long clr,
+  unsigned long set);
 static inline char *get_hpte_slot_array(pmd_t *pmdp)
 {
/*
@@ -253,51 +261,55 @@ static inline void mark_hpte_slot_valid(unsigned char 
*hpte_slot_array,
  * that for explicit huge pages.
  *
  */
-static inline int pmd_trans_huge(pmd_t pmd)
+static inline int hlpmd_trans_huge(pmd_t pmd)
 {
return !!((pmd_val(pmd) & (H_PAGE_PTE | H_PAGE_THP_HUGE)) ==
  (H_PAGE_PTE | H_PAGE_THP_HUGE));
 }
 
-static inline int pmd_large(pmd_t pmd)
+static inline int hlpmd_large(pmd_t pmd)
 {
return !!(pmd_val(pmd) & H_PAGE_PTE);
 }
 
-static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+static inline pmd_t hlpmd_mknotpresent(pmd_t pmd)
 {
return __pmd(pmd_val(pmd) & ~H_PAGE_PRESENT);
 }
 
-#define __HAVE_ARCH_PMD_SAME
-static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
+static inline pmd_t hlpmd_mkhuge(pmd_t pmd)
+{
+   return __pmd(pmd_val(pmd) | (H_PAGE_PTE | H_PAGE_THP_HUGE));
+}
+
+static inline int hlpmd_same(pmd_t pmd_a, pmd_t pmd_b)
 {
return (((pmd_val(pmd_a) ^ pmd_val(pmd_b)) & ~H_PAGE_HPTEFLAGS) == 0);
 }
 
-static inline int __pmdp_test_and_clear_young(struct mm_struct *mm,
+static inline int __hlpmdp_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pmd_t *pmdp)
 {
unsigned long old;
 
if ((pmd_val(*pmdp) & (H_PAGE_ACCESSED | H_PAGE_HASHPTE)) == 0)
return 0;
-   old = pmd_hugepage_update(mm, addr, pmdp, H_PAGE_ACCESSED, 0);
+   old = hlpmd_hugepage_update(mm, addr, pmdp, H_PAGE_ACCESSED, 0);
return ((old & H_PAGE_ACCESSED) != 0);
 }
 
-#define __HAVE_ARCH_PMDP_SET_WRPROTECT
-static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr,
+static inline void hlpmdp_set_wrprotect(struct mm_struct *mm, unsigned long 
addr,
  pmd_t *pmdp)
 {
 
if ((pmd_val(*pmdp) & H_PAGE_RW) == 0)
return;
 
-   pmd_hugepage_update(mm, addr, pmdp, H_PAGE_RW, 0);
+   hlpmd_hugepage_update(mm, addr, pmdp, H_PAGE_RW, 0);
 }
 
 #endif /*  CONFIG_TRANSPARENT_HUGEPAGE */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_64K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 551daeee6870..c9403f94c9fc 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -589,6 +589,22 @@ static inline void hpte_do_hugepage_flush(struct mm_struct 
*mm,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+extern int hlpmdp_set_access_flags(struct vm_area_struct *vma,
+  unsigned long address, pmd_t *pmdp,
+  pmd_t entry, int dirty);
+extern int hlpmdp_test_and_clear_young(struct vm_area_struct *vma,
+  unsigned long address, pmd_t *pmdp);
+extern pmd_t hlpmdp_huge_get_and_clear(struct mm_struct *mm,
+  unsigned long addr, pmd_t *pmdp);
+extern pmd_t hlpmdp_collapse_flush(struct vm_area_struct *vma,
+  unsigned long address, pmd_t *pmdp);
+extern void hlpgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+

[PATCH V3 25/30] powerpc/mm: Hash linux abstractions for early init routines

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  6 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 61 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  | 92 +++
 arch/powerpc/include/asm/mmu.h| 25 
 arch/powerpc/mm/hash_utils_64.c   |  6 +-
 5 files changed, 115 insertions(+), 75 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/mmu.h

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 16f513e5cbd7..b82e063494dd 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_MMU_HASH32_H_
-#define _ASM_POWERPC_MMU_HASH32_H_
+#ifndef _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
+#define _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
 /*
  * 32-bit hash table MMU support
  */
@@ -90,4 +90,4 @@ typedef struct {
 #define mmu_virtual_psize  MMU_PAGE_4K
 #define mmu_linear_psize   MMU_PAGE_256M
 
-#endif /* _ASM_POWERPC_MMU_HASH32_H_ */
+#endif /* _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_ */
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index a34f3687e093..5db7c344d969 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_MMU_HASH64_H_
-#define _ASM_POWERPC_MMU_HASH64_H_
+#ifndef _ASM_POWERPC_BOOK3S_64_MMU_HASH_H_
+#define _ASM_POWERPC_BOOK3S_64_MMU_HASH_H_
 /*
  * PowerPC64 memory management structures
  *
@@ -127,24 +127,6 @@ extern struct hash_pte *htab_address;
 extern unsigned long htab_size_bytes;
 extern unsigned long htab_hash_mask;
 
-/*
- * Page size definition
- *
- *shift : is the "PAGE_SHIFT" value for that page size
- *sllp  : is a bit mask with the value of SLB L || LP to be or'ed
- *directly to a slbmte "vsid" value
- *penc  : is the HPTE encoding mask for the "LP" field:
- *
- */
-struct mmu_psize_def
-{
-   unsigned intshift;  /* number of bits */
-   int penc[MMU_PAGE_COUNT];   /* HPTE encoding */
-   unsigned inttlbiel; /* tlbiel supported for that page size */
-   unsigned long   avpnm;  /* bits to mask out in AVPN in the HPTE */
-   unsigned long   sllp;   /* SLB L||LP (exact mask to use in slbmte) */
-};
-extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
 static inline int shift_to_mmu_psize(unsigned int shift)
 {
@@ -210,11 +192,6 @@ static inline int segment_shift(int ssize)
 /*
  * The current system page and segment sizes
  */
-extern int mmu_linear_psize;
-extern int mmu_virtual_psize;
-extern int mmu_vmalloc_psize;
-extern int mmu_vmemmap_psize;
-extern int mmu_io_psize;
 extern int mmu_kernel_ssize;
 extern int mmu_highuser_ssize;
 extern u16 mmu_slb_size;
@@ -512,38 +489,6 @@ static inline void subpage_prot_free(struct mm_struct *mm) 
{}
 static inline void subpage_prot_init_new_context(struct mm_struct *mm) { }
 #endif /* CONFIG_PPC_SUBPAGE_PROT */
 
-typedef unsigned long mm_context_id_t;
-struct spinlock;
-
-typedef struct {
-   mm_context_id_t id;
-   u16 user_psize; /* page size index */
-
-#ifdef CONFIG_PPC_MM_SLICES
-   u64 low_slices_psize;   /* SLB page size encodings */
-   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
-#else
-   u16 sllp;   /* SLB page size encoding */
-#endif
-   unsigned long vdso_base;
-#ifdef CONFIG_PPC_SUBPAGE_PROT
-   struct subpage_prot_table spt;
-#endif /* CONFIG_PPC_SUBPAGE_PROT */
-#ifdef CONFIG_PPC_ICSWX
-   struct spinlock *cop_lockp; /* guard acop and cop_pid */
-   unsigned long acop; /* mask of enabled coprocessor types */
-   unsigned int cop_pid;   /* pid value used with coprocessors */
-#endif /* CONFIG_PPC_ICSWX */
-#ifdef CONFIG_PPC_64K_PAGES
-   /* for 4K PTE fragment support */
-   void *pte_frag;
-#endif
-#ifdef CONFIG_SPAPR_TCE_IOMMU
-   struct list_head iommu_group_mem_list;
-#endif
-} mm_context_t;
-
-
 #if 0
 /*
  * The code below is equivalent to this function for arguments
@@ -610,4 +555,4 @@ static inline unsigned long get_kernel_vsid(unsigned long 
ea, int ssize)
 }
 #endif /* __ASSEMBLY__ */
 
-#endif /* _ASM_POWERPC_MMU_HASH64_H_ */
+#endif /* _ASM_POWERPC_BOOK3S_64_MMU_HASH_H_ */
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
new file mode 100644
index ..44a47bc81fb2
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -0,0 +1,92 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_MMU_H_
+#define _ASM_POWERPC_BOOK3S_64_MMU_H_
+
+#ifndef __ASSEMBLY__
+/*
+ * Page size definition
+ *
+ *shift : is the "PAGE_SHIFT" value for that page size
+ *sllp  : is a bit mask with the value of SLB L || LP to be or'ed
+ *directly to a slbmte "vsid" value
+ *

[PATCH V3 24/30] powerpc/mm: Move hash related mmu-*.h headers to book3s/

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/{mmu-hash32.h => book3s/32/mmu-hash.h} | 0
 arch/powerpc/include/asm/{mmu-hash64.h => book3s/64/mmu-hash.h} | 0
 arch/powerpc/include/asm/mmu.h  | 4 ++--
 arch/powerpc/kernel/idle_power7.S   | 2 +-
 arch/powerpc/kvm/book3s_32_mmu_host.c   | 2 +-
 arch/powerpc/kvm/book3s_64_mmu.c| 2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c   | 2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
 arch/powerpc/kvm/book3s_64_vio.c| 2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c | 2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 12 files changed, 11 insertions(+), 11 deletions(-)
 rename arch/powerpc/include/asm/{mmu-hash32.h => book3s/32/mmu-hash.h} (100%)
 rename arch/powerpc/include/asm/{mmu-hash64.h => book3s/64/mmu-hash.h} (100%)

diff --git a/arch/powerpc/include/asm/mmu-hash32.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-hash32.h
rename to arch/powerpc/include/asm/book3s/32/mmu-hash.h
diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-hash64.h
rename to arch/powerpc/include/asm/book3s/64/mmu-hash.h
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 54d46504733d..8ca1c983bf6c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -183,10 +183,10 @@ static inline void assert_pte_locked(struct mm_struct 
*mm, unsigned long addr)
 
 #if defined(CONFIG_PPC_STD_MMU_64)
 /* 64-bit classic hash table MMU */
-#  include 
+#include 
 #elif defined(CONFIG_PPC_STD_MMU_32)
 /* 32-bit classic hash table MMU */
-#  include 
+#include 
 #elif defined(CONFIG_40x)
 /* 40x-style software loaded TLB */
 #  include 
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index cf4fb5429cf1..470ceebd2d23 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -19,7 +19,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #undef DEBUG
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 55c4d51ea3e2..999106991a76 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -22,7 +22,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 9bf7031a67ff..b9131aa1aedf 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -26,7 +26,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 /* #define DEBUG_MMU */
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 30fc2d83dffa..d7959b2a8b32 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -23,7 +23,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fb37290a57b4..c7b78d8336b2 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -32,7 +32,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc94dad..9c3b76bb69d9 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -30,7 +30,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3e0039..039028d3ccb5 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -29,7 +29,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 91700518bbf3..4cb8db05f3e5 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6ee26de9a1de..c613fee0b9f7 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #define VCPU_GPRS_TM(reg) 

[PATCH V3 22/30] powerpc/mm: Hash linux abstraction for functions in pgtable-hash.c

2016-02-18 Thread Aneesh Kumar K.V
We will later make the generic functions do conditial radix or hash
page table access. This patch doesn't do hugepage api update yet.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 13 
 arch/powerpc/include/asm/book3s/64/hash.h| 12 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 47 +++-
 arch/powerpc/include/asm/book3s/pgtable.h|  4 ---
 arch/powerpc/include/asm/nohash/64/pgtable.h |  4 ++-
 arch/powerpc/include/asm/nohash/pgtable.h| 11 +++
 arch/powerpc/include/asm/pgtable.h   | 13 
 arch/powerpc/mm/init_64.c|  3 --
 arch/powerpc/mm/pgtable-hash64.c | 34 ++--
 9 files changed, 101 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 38b33dcfcc9d..539609c8a77b 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -102,6 +102,9 @@ extern unsigned long ioremap_bot;
 #define pte_clear(mm, addr, ptep) \
do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0)
 
+extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+  pte_t pte);
+
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(pmd_val(pmd) & _PMD_BAD)
 #definepmd_present(pmd)(pmd_val(pmd) & _PMD_PRESENT_MASK)
@@ -477,6 +480,16 @@ static inline pgprot_t pgprot_writecombine(pgprot_t prot)
return pgprot_noncached_wc(prot);
 }
 
+/*
+ * This gets called at the end of handling a page fault, when
+ * the kernel has put a new PTE into the page table for the process.
+ * We use it to ensure coherency between the i-cache and d-cache
+ * for the page which has just been mapped in.
+ * On machines which use an MMU hash table, we use this to put a
+ * corresponding HPTE into the hash table ahead of time, instead of
+ * waiting for the inevitable extra hash-table miss exception.
+ */
+extern void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);
 #endif /* !__ASSEMBLY__ */
 
 #endif /*  _ASM_POWERPC_BOOK3S_32_PGTABLE_H */
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index d80c4c7fa6c1..551daeee6870 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -589,7 +589,17 @@ static inline void hpte_do_hugepage_flush(struct mm_struct 
*mm,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-extern int map_kernel_page(unsigned long ea, unsigned long pa, int flags);
+extern int hlmap_kernel_page(unsigned long ea, unsigned long pa, int flags);
+extern void hlpgtable_cache_init(void);
+extern void __meminit hlvmemmap_create_mapping(unsigned long start,
+  unsigned long page_size,
+  unsigned long phys);
+extern void hlvmemmap_remove_mapping(unsigned long start,
+unsigned long page_size);
+extern void set_hlpte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+pte_t pte);
+extern void hlupdate_mmu_cache(struct vm_area_struct *vma, unsigned long 
address,
+  pte_t *ptep);
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cf400803e61c..005f0e265f37 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -320,6 +320,12 @@ static inline int pte_present(pte_t pte)
return hlpte_present(pte);
 }
 
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+   return set_hlpte_at(mm, addr, ptep, pte);
+}
+
 static inline void pmd_set(pmd_t *pmdp, unsigned long val)
 {
*pmdp = __pmd(val);
@@ -462,7 +468,46 @@ extern struct page *pgd_page(pgd_t pgd);
pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
-void pgtable_cache_init(void);
+static inline void pgtable_cache_init(void)
+{
+   return hlpgtable_cache_init();
+}
+
+static inline int map_kernel_page(unsigned long ea, unsigned long pa,
+ unsigned long flags)
+{
+   return hlmap_kernel_page(ea, pa, flags);
+}
+
+static inline void __meminit vmemmap_create_mapping(unsigned long start,
+   unsigned long page_size,
+   unsigned long phys)
+{
+   return hlvmemmap_create_mapping(start, page_size, phys);
+}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline void 

[PATCH V3 23/30] powerpc/mm: Hash linux abstraction for mmu context handling code

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/mmu_context.h | 63 +++---
 arch/powerpc/kernel/swsusp.c   |  2 +-
 arch/powerpc/mm/mmu_context_hash64.c   | 16 -
 arch/powerpc/mm/mmu_context_nohash.c   |  3 +-
 drivers/cpufreq/pmac32-cpufreq.c   |  2 +-
 drivers/macintosh/via-pmu.c|  4 +--
 6 files changed, 57 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 878c27771717..5124b721da6e 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -10,11 +10,6 @@
 #include 
 #include 
 
-/*
- * Most if the context management is out of line
- */
-extern int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
@@ -33,16 +28,50 @@ extern long mm_iommu_ua_to_hpa(struct 
mm_iommu_table_group_mem_t *mem,
 extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
 extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
 #endif
+/*
+ * Most of the context management is out of line
+ */
+#ifdef CONFIG_PPC_BOOK3S_64
+extern int hlinit_new_context(struct task_struct *tsk, struct mm_struct *mm);
+static inline int init_new_context(struct task_struct *tsk, struct mm_struct 
*mm)
+{
+   return hlinit_new_context(tsk, mm);
+}
+
+extern void hldestroy_context(struct mm_struct *mm);
+static inline void destroy_context(struct mm_struct *mm)
+{
+   return hldestroy_context(mm);
+}
 
-extern void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next);
 extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
-extern void set_context(unsigned long id, pgd_t *pgd);
+static inline void switch_mmu_context(struct mm_struct *prev,
+ struct mm_struct *next,
+ struct task_struct *tsk)
+{
+   return switch_slb(tsk, next);
+}
 
-#ifdef CONFIG_PPC_BOOK3S_64
-extern int __init_new_context(void);
-extern void __destroy_context(int context_id);
+extern void set_context(unsigned long id, pgd_t *pgd);
+extern int __hlinit_new_context(void);
+static inline int __init_new_context(void)
+{
+   return __hlinit_new_context();
+}
+extern void __hldestroy_context(int context_id);
+static inline void __destroy_context(int context_id)
+{
+   return __hldestroy_context(context_id);
+}
 static inline void mmu_context_init(void) { }
 #else
+extern int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
+extern void destroy_context(struct mm_struct *mm);
+
+extern void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next,
+  struct task_struct *tsk);
+extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
+extern void set_context(unsigned long id, pgd_t *pgd);
 extern unsigned long __init_new_context(void);
 extern void __destroy_context(unsigned long context_id);
 extern void mmu_context_init(void);
@@ -88,17 +117,11 @@ static inline void switch_mm(struct mm_struct *prev, 
struct mm_struct *next,
if (cpu_has_feature(CPU_FTR_ALTIVEC))
asm volatile ("dssall");
 #endif /* CONFIG_ALTIVEC */
-
-   /* The actual HW switching method differs between the various
-* sub architectures.
+   /*
+* The actual HW switching method differs between the various
+* sub architectures. Out of line for now
 */
-#ifdef CONFIG_PPC_STD_MMU_64
-   switch_slb(tsk, next);
-#else
-   /* Out of line for now */
-   switch_mmu_context(prev, next);
-#endif
-
+   switch_mmu_context(prev, next, tsk);
 }
 
 #define deactivate_mm(tsk,mm)  do { } while (0)
diff --git a/arch/powerpc/kernel/swsusp.c b/arch/powerpc/kernel/swsusp.c
index 6669b1752512..6ae9bd5086a4 100644
--- a/arch/powerpc/kernel/swsusp.c
+++ b/arch/powerpc/kernel/swsusp.c
@@ -31,6 +31,6 @@ void save_processor_state(void)
 void restore_processor_state(void)
 {
 #ifdef CONFIG_PPC32
-   switch_mmu_context(current->active_mm, current->active_mm);
+   switch_mmu_context(current->active_mm, current->active_mm, NULL);
 #endif
 }
diff --git a/arch/powerpc/mm/mmu_context_hash64.c 
b/arch/powerpc/mm/mmu_context_hash64.c
index ff9baa5d2944..9c147d800760 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -30,7 +30,7 @@
 static DEFINE_SPINLOCK(mmu_context_lock);
 static DEFINE_IDA(mmu_context_ida);
 
-int __init_new_context(void)
+int __hlinit_new_context(void)
 {
int index;
int err;
@@ -59,11 +59,11 @@ again:
 }
 EXPORT_SYMBOL_GPL(__init_new_context);
 
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+int hlinit_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
int index;
 
-   

[PATCH V3 21/30] powerpc/mm: Hash linux abstraction for page table accessors

2016-02-18 Thread Aneesh Kumar K.V
We will later make the generic functions do conditial radix or hash
page table access. This patch doesn't do hugepage api update yet.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 133 +++---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 251 ++-
 arch/powerpc/mm/hash_utils_64.c  |   6 +-
 3 files changed, 324 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 890c81014dc7..d80c4c7fa6c1 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -221,18 +221,18 @@
 #define H_PUD_BAD_BITS (H_PMD_TABLE_SIZE-1)
 
 #ifndef __ASSEMBLY__
-#definepmd_bad(pmd)(!is_kernel_addr(pmd_val(pmd)) \
+#definehlpmd_bad(pmd)  (!is_kernel_addr(pmd_val(pmd))  
\
 || (pmd_val(pmd) & H_PMD_BAD_BITS))
-#define pmd_page_vaddr(pmd)(pmd_val(pmd) & ~H_PMD_MASKED_BITS)
+#define hlpmd_page_vaddr(pmd)  (pmd_val(pmd) & ~H_PMD_MASKED_BITS)
 
-#definepud_bad(pud)(!is_kernel_addr(pud_val(pud)) \
+#definehlpud_bad(pud)  (!is_kernel_addr(pud_val(pud))  
\
 || (pud_val(pud) & H_PUD_BAD_BITS))
-#define pud_page_vaddr(pud)(pud_val(pud) & ~H_PUD_MASKED_BITS)
+#define hlpud_page_vaddr(pud)  (pud_val(pud) & ~H_PUD_MASKED_BITS)
 
-#define pgd_index(address) (((address) >> (H_PGDIR_SHIFT)) & (H_PTRS_PER_PGD - 
1))
-#define pud_index(address) (((address) >> (H_PUD_SHIFT)) & (H_PTRS_PER_PUD - 
1))
-#define pmd_index(address) (((address) >> (H_PMD_SHIFT)) & (H_PTRS_PER_PMD - 
1))
-#define pte_index(address) (((address) >> (PAGE_SHIFT)) & (H_PTRS_PER_PTE - 1))
+#define hlpgd_index(address) (((address) >> (H_PGDIR_SHIFT)) & (H_PTRS_PER_PGD 
- 1))
+#define hlpud_index(address) (((address) >> (H_PUD_SHIFT)) & (H_PTRS_PER_PUD - 
1))
+#define hlpmd_index(address) (((address) >> (H_PMD_SHIFT)) & (H_PTRS_PER_PMD - 
1))
+#define hlpte_index(address) (((address) >> (PAGE_SHIFT)) & (H_PTRS_PER_PTE - 
1))
 
 /* Encode and de-code a swap entry */
 #define MAX_SWAPFILES_CHECK() do { \
@@ -290,11 +290,11 @@ extern void hpte_need_flush(struct mm_struct *mm, 
unsigned long addr,
pte_t *ptep, unsigned long pte, int huge);
 extern unsigned long htab_convert_pte_flags(unsigned long pteflags);
 /* Atomic PTE updates */
-static inline unsigned long pte_update(struct mm_struct *mm,
-  unsigned long addr,
-  pte_t *ptep, unsigned long clr,
-  unsigned long set,
-  int huge)
+static inline unsigned long hlpte_update(struct mm_struct *mm,
+unsigned long addr,
+pte_t *ptep, unsigned long clr,
+unsigned long set,
+int huge)
 {
unsigned long old, tmp;
 
@@ -327,42 +327,41 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
  * We should be more intelligent about this but for the moment we override
  * these functions and force a tlb flush unconditionally
  */
-static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
+static inline int __hlptep_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pte_t *ptep)
 {
unsigned long old;
 
if ((pte_val(*ptep) & (H_PAGE_ACCESSED | H_PAGE_HASHPTE)) == 0)
return 0;
-   old = pte_update(mm, addr, ptep, H_PAGE_ACCESSED, 0, 0);
+   old = hlpte_update(mm, addr, ptep, H_PAGE_ACCESSED, 0, 0);
return (old & H_PAGE_ACCESSED) != 0;
 }
 
-#define __HAVE_ARCH_PTEP_SET_WRPROTECT
-static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
+static inline void hlptep_set_wrprotect(struct mm_struct *mm, unsigned long 
addr,
  pte_t *ptep)
 {
 
if ((pte_val(*ptep) & H_PAGE_RW) == 0)
return;
 
-   pte_update(mm, addr, ptep, H_PAGE_RW, 0, 0);
+   hlpte_update(mm, addr, ptep, H_PAGE_RW, 0, 0);
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+static inline void huge_hlptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
if ((pte_val(*ptep) & H_PAGE_RW) == 0)
return;
 
-   pte_update(mm, addr, ptep, H_PAGE_RW, 0, 1);
+   hlpte_update(mm, addr, ptep, H_PAGE_RW, 0, 1);
 }
 
 
 /* Set the dirty and/or accessed bits atomically in a linux PTE, this
  * function doesn't need to flush the hash entry
  */
-static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
+static inline void 

[PATCH V3 20/30] powerpc/mm: Create a new headers for tlbflush for hash64

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 94 ++
 arch/powerpc/include/asm/tlbflush.h| 92 +
 2 files changed, 95 insertions(+), 91 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
new file mode 100644
index ..1b753f96b374
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -0,0 +1,94 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_TLBFLUSH_HASH_H
+#define _ASM_POWERPC_BOOK3S_64_TLBFLUSH_HASH_H
+
+#define MMU_NO_CONTEXT 0
+
+/*
+ * TLB flushing for 64-bit hash-MMU CPUs
+ */
+
+#include 
+#include 
+
+#define PPC64_TLB_BATCH_NR 192
+
+struct ppc64_tlb_batch {
+   int active;
+   unsigned long   index;
+   struct mm_struct*mm;
+   real_pte_t  pte[PPC64_TLB_BATCH_NR];
+   unsigned long   vpn[PPC64_TLB_BATCH_NR];
+   unsigned intpsize;
+   int ssize;
+};
+DECLARE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
+
+extern void __flush_tlb_pending(struct ppc64_tlb_batch *batch);
+
+#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
+
+static inline void arch_enter_lazy_mmu_mode(void)
+{
+   struct ppc64_tlb_batch *batch = this_cpu_ptr(_tlb_batch);
+
+   batch->active = 1;
+}
+
+static inline void arch_leave_lazy_mmu_mode(void)
+{
+   struct ppc64_tlb_batch *batch = this_cpu_ptr(_tlb_batch);
+
+   if (batch->index)
+   __flush_tlb_pending(batch);
+   batch->active = 0;
+}
+
+#define arch_flush_lazy_mmu_mode()  do {} while (0)
+
+
+extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
+   int ssize, unsigned long flags);
+extern void flush_hash_range(unsigned long number, int local);
+extern void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
+   pmd_t *pmdp, unsigned int psize, int ssize,
+   unsigned long flags);
+
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+}
+
+static inline void flush_tlb_mm(struct mm_struct *mm)
+{
+}
+
+static inline void local_flush_tlb_page(struct vm_area_struct *vma,
+   unsigned long vmaddr)
+{
+}
+
+static inline void flush_tlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+}
+
+static inline void flush_tlb_page_nohash(struct vm_area_struct *vma,
+unsigned long vmaddr)
+{
+}
+
+static inline void flush_tlb_range(struct vm_area_struct *vma,
+  unsigned long start, unsigned long end)
+{
+}
+
+static inline void flush_tlb_kernel_range(unsigned long start,
+ unsigned long end)
+{
+}
+
+/* Private function for use by PCI IO mapping code */
+extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
+unsigned long end);
+extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd,
+   unsigned long addr);
+#endif /*  _ASM_POWERPC_BOOK3S_64_TLBFLUSH_HASH_H */
diff --git a/arch/powerpc/include/asm/tlbflush.h 
b/arch/powerpc/include/asm/tlbflush.h
index 23d351ca0303..9f77f85e3e99 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -78,97 +78,7 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
 }
 
 #elif defined(CONFIG_PPC_STD_MMU_64)
-
-#define MMU_NO_CONTEXT 0
-
-/*
- * TLB flushing for 64-bit hash-MMU CPUs
- */
-
-#include 
-#include 
-
-#define PPC64_TLB_BATCH_NR 192
-
-struct ppc64_tlb_batch {
-   int active;
-   unsigned long   index;
-   struct mm_struct*mm;
-   real_pte_t  pte[PPC64_TLB_BATCH_NR];
-   unsigned long   vpn[PPC64_TLB_BATCH_NR];
-   unsigned intpsize;
-   int ssize;
-};
-DECLARE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
-
-extern void __flush_tlb_pending(struct ppc64_tlb_batch *batch);
-
-#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
-
-static inline void arch_enter_lazy_mmu_mode(void)
-{
-   struct ppc64_tlb_batch *batch = this_cpu_ptr(_tlb_batch);
-
-   batch->active = 1;
-}
-
-static inline void arch_leave_lazy_mmu_mode(void)
-{
-   struct ppc64_tlb_batch *batch = this_cpu_ptr(_tlb_batch);
-
-   if (batch->index)
-   __flush_tlb_pending(batch);
-   batch->active = 0;
-}
-
-#define arch_flush_lazy_mmu_mode()  do {} while (0)
-
-
-extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
-   int ssize, unsigned long flags);
-extern void 

[PATCH V3 16/30] powerpc/mm: Rename hash specific page table bits (_PAGE* -> H_PAGE*)

2016-02-18 Thread Aneesh Kumar K.V
This patch renames _PAGE* -> H_PAGE*. This enables us to support
different page table format in the same kernel. Between radix and
hash we will have different bit position to indicate different pte
states like dirty, present etc. Inorder to enable single kernel to
support both radix and hash page table format, a pte accessor need to
check for different bits depending on which config we are using.

For ex: a pte dirty check would end up
pte_dirty(pte_t pte)
{
if (radix_enabled())
pte & _RPAGE_DIRTY;
else
pte & H_PAGE_DIRTY;
}

Some of the defines used by core kernel like PMD_SHIFT are now converted
to variables so that we can avoid conditional there. ie, we do

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  60 ++--
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 111 ---
 arch/powerpc/include/asm/book3s/64/hash.h | 334 +++---
 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h |  16 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  75 -
 arch/powerpc/include/asm/kvm_book3s_64.h  |  10 +-
 arch/powerpc/include/asm/mmu-hash64.h |   4 +-
 arch/powerpc/include/asm/page_64.h|   2 +-
 arch/powerpc/include/asm/pte-common.h |   3 +
 arch/powerpc/kernel/asm-offsets.c |   9 +-
 arch/powerpc/kernel/pci_64.c  |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c |   2 +-
 arch/powerpc/mm/copro_fault.c |   8 +-
 arch/powerpc/mm/hash64_4k.c   |  25 +-
 arch/powerpc/mm/hash64_64k.c  |  63 ++--
 arch/powerpc/mm/hash_native_64.c  |  10 +-
 arch/powerpc/mm/hash_utils_64.c   |  99 ---
 arch/powerpc/mm/hugepage-hash64.c |  24 +-
 arch/powerpc/mm/hugetlbpage-hash64.c  |  46 +--
 arch/powerpc/mm/mmu_context_hash64.c  |   4 +-
 arch/powerpc/mm/pgtable-hash64.c  |  42 +--
 arch/powerpc/mm/pgtable_64.c  |  98 +--
 arch/powerpc/mm/slb.c |   8 +-
 arch/powerpc/mm/slb_low.S |   4 +-
 arch/powerpc/mm/slice.c   |   2 +-
 arch/powerpc/mm/tlb_hash64.c  |   8 +-
 arch/powerpc/platforms/cell/spu_base.c|   6 +-
 arch/powerpc/platforms/cell/spufs/fault.c |   4 +-
 arch/powerpc/platforms/ps3/spu.c  |   2 +-
 arch/powerpc/platforms/pseries/lpar.c |  12 +-
 drivers/char/agp/uninorth-agp.c   |   9 +-
 drivers/misc/cxl/fault.c  |   6 +-
 32 files changed, 639 insertions(+), 470 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index c78f5928001b..1ef4b39f96fd 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -5,56 +5,56 @@
  * for each page table entry.  The PMD and PGD level use a 32b record for
  * each entry by assuming that each entry is page aligned.
  */
-#define PTE_INDEX_SIZE  9
-#define PMD_INDEX_SIZE  7
-#define PUD_INDEX_SIZE  9
-#define PGD_INDEX_SIZE  9
+#define H_PTE_INDEX_SIZE  9
+#define H_PMD_INDEX_SIZE  7
+#define H_PUD_INDEX_SIZE  9
+#define H_PGD_INDEX_SIZE  9
 
 #ifndef __ASSEMBLY__
-#define PTE_TABLE_SIZE (sizeof(pte_t) << PTE_INDEX_SIZE)
-#define PMD_TABLE_SIZE (sizeof(pmd_t) << PMD_INDEX_SIZE)
-#define PUD_TABLE_SIZE (sizeof(pud_t) << PUD_INDEX_SIZE)
-#define PGD_TABLE_SIZE (sizeof(pgd_t) << PGD_INDEX_SIZE)
+#define H_PTE_TABLE_SIZE   (sizeof(pte_t) << H_PTE_INDEX_SIZE)
+#define H_PMD_TABLE_SIZE   (sizeof(pmd_t) << H_PMD_INDEX_SIZE)
+#define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
+#define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 #endif /* __ASSEMBLY__ */
 
-#define PTRS_PER_PTE   (1 << PTE_INDEX_SIZE)
-#define PTRS_PER_PMD   (1 << PMD_INDEX_SIZE)
-#define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
-#define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
+#define H_PTRS_PER_PTE (1 << H_PTE_INDEX_SIZE)
+#define H_PTRS_PER_PMD (1 << H_PMD_INDEX_SIZE)
+#define H_PTRS_PER_PUD (1 << H_PUD_INDEX_SIZE)
+#define H_PTRS_PER_PGD (1 << H_PGD_INDEX_SIZE)
 
 /* PMD_SHIFT determines what a second-level page table entry can map */
-#define PMD_SHIFT  (PAGE_SHIFT + PTE_INDEX_SIZE)
-#define PMD_SIZE   (1UL << PMD_SHIFT)
-#define PMD_MASK   (~(PMD_SIZE-1))
+#define H_PMD_SHIFT(PAGE_SHIFT + H_PTE_INDEX_SIZE)
+#define H_PMD_SIZE (1UL << H_PMD_SHIFT)
+#define H_PMD_MASK (~(H_PMD_SIZE-1))
 
 /* With 4k base page size, hugepage PTEs go at the PMD level */
-#define MIN_HUGEPTE_SHIFT  PMD_SHIFT
+#define MIN_HUGEPTE_SHIFT  H_PMD_SHIFT
 
 /* PUD_SHIFT determines what a third-level page table entry can map */
-#define PUD_SHIFT  (PMD_SHIFT + 

[PATCH V3 17/30] powerpc/mm: Use flush_tlb_page in ptep_clear_flush_young

2016-02-18 Thread Aneesh Kumar K.V
This should not have any impact for hash linux implementation. But radix
would require us to flush tlb after clearing accessed bit. Also move
code that is not dependent on pte bits to generic header.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 45 +---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 39 
 arch/powerpc/include/asm/mmu-hash64.h|  2 +-
 3 files changed, 48 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 0bcd9f0d16c8..890c81014dc7 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -319,6 +319,14 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
return old;
 }
 
+/*
+ * We currently remove entries from the hashtable regardless of whether
+ * the entry was young or dirty. The generic routines only flush if the
+ * entry was young or dirty which is not good enough.
+ *
+ * We should be more intelligent about this but for the moment we override
+ * these functions and force a tlb flush unconditionally
+ */
 static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pte_t *ptep)
 {
@@ -329,13 +337,6 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
old = pte_update(mm, addr, ptep, H_PAGE_ACCESSED, 0, 0);
return (old & H_PAGE_ACCESSED) != 0;
 }
-#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
-#define ptep_test_and_clear_young(__vma, __addr, __ptep)  \
-({\
-   int __r;   \
-   __r = __ptep_test_and_clear_young((__vma)->vm_mm, __addr, __ptep); \
-   __r;   \
-})
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
@@ -357,36 +358,6 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
pte_update(mm, addr, ptep, H_PAGE_RW, 0, 1);
 }
 
-/*
- * We currently remove entries from the hashtable regardless of whether
- * the entry was young or dirty. The generic routines only flush if the
- * entry was young or dirty which is not good enough.
- *
- * We should be more intelligent about this but for the moment we override
- * these functions and force a tlb flush unconditionally
- */
-#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
-#define ptep_clear_flush_young(__vma, __address, __ptep)   \
-({ \
-   int __young = __ptep_test_and_clear_young((__vma)->vm_mm, __address, \
- __ptep);  \
-   __young;\
-})
-
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
-static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
-}
-
-static inline void pte_clear(struct mm_struct *mm, unsigned long addr,
-pte_t * ptep)
-{
-   pte_update(mm, addr, ptep, ~0UL, 0, 0);
-}
-
 
 /* Set the dirty and/or accessed bits atomically in a linux PTE, this
  * function doesn't need to flush the hash entry
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a57f425043ee..2e6d2362748b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -8,6 +8,10 @@
 #include 
 #include 
 
+#ifndef __ASSEMBLY__
+#include 
+#include 
+#endif
 /*
  * The second half of the kernel virtual space is used for IO mappings,
  * it's itself carved into the PIO region (ISA and PHB IO space) and
@@ -129,6 +133,41 @@ extern unsigned long ioremap_bot;
 
 #endif /* __real_pte */
 
+#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
+static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
+   unsigned long address,
+   pte_t *ptep)
+{
+   return  __ptep_test_and_clear_young(vma->vm_mm, address, ptep);
+}
+
+#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
+static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
+unsigned long address, pte_t *ptep)
+{
+   int young;
+
+   young = __ptep_test_and_clear_young(vma->vm_mm, address, ptep);
+   if (young)
+   flush_tlb_page(vma, address);
+   return young;
+}
+
+#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
+static inline pte_t ptep_get_and_clear(struct 

[PATCH V3 19/30] powerpc/mm: Use generic version of pmdp_clear_flush_young

2016-02-18 Thread Aneesh Kumar K.V
The radix variant is going to require a flush_tlb_range. We can't then
have this as static inline because of the usage of HPAGE_PMD_SIZE. So
we are forced to make it a function in which case we can use the generic 
version.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  3 ---
 arch/powerpc/mm/pgtable-hash64.c | 10 ++
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 2e6d2362748b..437f632d6185 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -333,9 +333,6 @@ extern int pmdp_set_access_flags(struct vm_area_struct *vma,
 #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG
 extern int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 unsigned long address, pmd_t *pmdp);
-#define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH
-extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp);
 
 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
 extern pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index fa30c8d9561a..c11e19f68b6d 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -350,12 +350,6 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, 
unsigned long address,
return pmd;
 }
 
-int pmdp_test_and_clear_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp)
-{
-   return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
-}
-
 /*
  * We currently remove entries from the hashtable regardless of whether
  * the entry was young or dirty. The generic routines only flush if the
@@ -364,8 +358,8 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma,
  * We should be more intelligent about this but for the moment we override
  * these functions and force a tlb flush unconditionally
  */
-int pmdp_clear_flush_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp)
+int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
 {
return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
 }
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 18/30] powerpc/mm: THP is only available on hash64 as of now

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable-hash64.c | 374 +++
 arch/powerpc/mm/pgtable_64.c | 374 ---
 2 files changed, 374 insertions(+), 374 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index 4813a3c2d457..fa30c8d9561a 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -21,6 +21,9 @@
 
 #include "mmu_decl.h"
 
+#define CREATE_TRACE_POINTS
+#include 
+
 #if H_PGTABLE_RANGE > USER_VSID_RANGE
 #warning Limited user VSID range means pagetable space is wasted
 #endif
@@ -245,3 +248,374 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
/* Perform the setting of the PTE */
__set_pte_at(mm, addr, ptep, pte, 0);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+
+/*
+ * This is called when relaxing access to a hugepage. It's also called in the 
page
+ * fault path when we don't hit any of the major fault cases, ie, a minor
+ * update of _PAGE_ACCESSED, _PAGE_DIRTY, etc... The generic code will have
+ * handled those two for us, we additionally deal with missing execute
+ * permission here on some processors
+ */
+int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp, pmd_t entry, int dirty)
+{
+   int changed;
+#ifdef CONFIG_DEBUG_VM
+   WARN_ON(!pmd_trans_huge(*pmdp));
+   assert_spin_locked(>vm_mm->page_table_lock);
+#endif
+   changed = !pmd_same(*(pmdp), entry);
+   if (changed) {
+   __ptep_set_access_flags(pmdp_ptep(pmdp), pmd_pte(entry));
+   /*
+* Since we are not supporting SW TLB systems, we don't
+* have any thing similar to flush_tlb_page_nohash()
+*/
+   }
+   return changed;
+}
+
+unsigned long pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, unsigned long clr,
+ unsigned long set)
+{
+
+   unsigned long old, tmp;
+
+#ifdef CONFIG_DEBUG_VM
+   WARN_ON(!pmd_trans_huge(*pmdp));
+   assert_spin_locked(>page_table_lock);
+#endif
+
+#ifdef PTE_ATOMIC_UPDATES
+   __asm__ __volatile__(
+   "1: ldarx   %0,0,%3\n\
+   andi.   %1,%0,%6\n\
+   bne-1b \n\
+   andc%1,%0,%4 \n\
+   or  %1,%1,%7\n\
+   stdcx.  %1,0,%3 \n\
+   bne-1b"
+   : "=" (old), "=" (tmp), "=m" (*pmdp)
+   : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (H_PAGE_BUSY), "r" (set)
+   : "cc" );
+#else
+   old = pmd_val(*pmdp);
+   *pmdp = __pmd((old & ~clr) | set);
+#endif
+   trace_hugepage_update(addr, old, clr, set);
+   if (old & H_PAGE_HASHPTE)
+   hpte_do_hugepage_flush(mm, addr, pmdp, old);
+   return old;
+}
+
+pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+   pmd_t pmd;
+
+   VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+   VM_BUG_ON(pmd_trans_huge(*pmdp));
+
+   pmd = *pmdp;
+   pmd_clear(pmdp);
+   /*
+* Wait for all pending hash_page to finish. This is needed
+* in case of subpage collapse. When we collapse normal pages
+* to hugepage, we first clear the pmd, then invalidate all
+* the PTE entries. The assumption here is that any low level
+* page fault will see a none pmd and take the slow path that
+* will wait on mmap_sem. But we could very well be in a
+* hash_page with local ptep pointer value. Such a hash page
+* can result in adding new HPTE entries for normal subpages.
+* That means we could be modifying the page content as we
+* copy them to a huge page. So wait for parallel hash_page
+* to finish before invalidating HPTE entries. We can do this
+* by sending an IPI to all the cpus and executing a dummy
+* function there.
+*/
+   kick_all_cpus_sync();
+   /*
+* Now invalidate the hpte entries in the range
+* covered by pmd. This make sure we take a
+* fault and will find the pmd as none, which will
+* result in a major fault which takes mmap_sem and
+* hence wait for collapse to complete. Without this
+* the __collapse_huge_page_copy can result in copying
+* the old content.
+*/
+   flush_tlb_pmd_range(vma->vm_mm, , address);
+   return pmd;
+}
+
+int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+   return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
+}
+
+/*
+ * We currently remove entries from the hashtable regardless of whether
+ * the entry was young or dirty. The generic routines only flush if the
+ * entry was 

[PATCH V3 15/30] powerpc/mm: Move hash page table related functions to pgtable-hash64.c

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h|   1 +
 arch/powerpc/include/asm/nohash/64/pgtable.h |   2 +
 arch/powerpc/mm/Makefile |   3 +-
 arch/powerpc/mm/init_64.c| 114 +
 arch/powerpc/mm/mem.c|  29 +---
 arch/powerpc/mm/mmu_decl.h   |   4 -
 arch/powerpc/mm/pgtable-book3e.c | 163 ++
 arch/powerpc/mm/pgtable-hash64.c | 247 +++
 arch/powerpc/mm/pgtable.c|   9 +
 arch/powerpc/mm/pgtable_64.c |  88 --
 arch/powerpc/mm/ppc_mmu_32.c |  30 
 11 files changed, 462 insertions(+), 228 deletions(-)
 create mode 100644 arch/powerpc/mm/pgtable-book3e.c
 create mode 100644 arch/powerpc/mm/pgtable-hash64.c

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index e88573440bbe..05a048bc4a64 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -603,6 +603,7 @@ static inline void hpte_do_hugepage_flush(struct mm_struct 
*mm,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+extern int map_kernel_page(unsigned long ea, unsigned long pa, int flags);
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index b9f734dd5b81..a68e809d7739 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -359,6 +359,8 @@ static inline void __ptep_set_access_flags(pte_t *ptep, 
pte_t entry)
 
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
 void pgtable_cache_init(void);
+extern int map_kernel_page(unsigned long ea, unsigned long pa, int flags);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_NOHASH_64_PGTABLE_H */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 1ffeda85c086..6b5cc805c7ba 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -13,7 +13,8 @@ obj-$(CONFIG_PPC_MMU_NOHASH)  += mmu_context_nohash.o 
tlb_nohash.o \
   tlb_nohash_low.o
 obj-$(CONFIG_PPC_BOOK3E)   += tlb_low_$(CONFIG_WORD_SIZE)e.o
 hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o
-obj-$(CONFIG_PPC_STD_MMU_64)   += hash_utils_64.o slb_low.o slb.o $(hash64-y)
+obj-$(CONFIG_PPC_BOOK3E_64)   += pgtable-book3e.o
+obj-$(CONFIG_PPC_STD_MMU_64)   += pgtable-hash64.o hash_utils_64.o slb_low.o 
slb.o $(hash64-y)
 obj-$(CONFIG_PPC_STD_MMU_32)   += ppc_mmu_32.o hash_low_32.o
 obj-$(CONFIG_PPC_STD_MMU)  += tlb_hash$(CONFIG_WORD_SIZE).o \
   mmu_context_hash$(CONFIG_WORD_SIZE).o
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 8ce1ec24d573..05b025a0efe6 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -65,38 +65,10 @@
 
 #include "mmu_decl.h"
 
-#ifdef CONFIG_PPC_STD_MMU_64
-#if PGTABLE_RANGE > USER_VSID_RANGE
-#warning Limited user VSID range means pagetable space is wasted
-#endif
-
-#if (TASK_SIZE_USER64 < PGTABLE_RANGE) && (TASK_SIZE_USER64 < USER_VSID_RANGE)
-#warning TASK_SIZE is smaller than it needs to be.
-#endif
-#endif /* CONFIG_PPC_STD_MMU_64 */
-
 phys_addr_t memstart_addr = ~0;
 EXPORT_SYMBOL_GPL(memstart_addr);
 phys_addr_t kernstart_addr;
 EXPORT_SYMBOL_GPL(kernstart_addr);
-
-static void pgd_ctor(void *addr)
-{
-   memset(addr, 0, PGD_TABLE_SIZE);
-}
-
-static void pud_ctor(void *addr)
-{
-   memset(addr, 0, PUD_TABLE_SIZE);
-}
-
-static void pmd_ctor(void *addr)
-{
-   memset(addr, 0, PMD_TABLE_SIZE);
-}
-
-struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
-
 /*
  * Create a kmem_cache() for pagetables.  This is not used for PTE
  * pages - they're linked to struct page, come from the normal free
@@ -104,6 +76,7 @@ struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
  * everything else.  Caches created by this function are used for all
  * the higher level pagetables, and for hugepage pagetables.
  */
+struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
 {
char *name;
@@ -138,25 +111,6 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void 
*))
pr_debug("Allocated pgtable cache for order %d\n", shift);
 }
 
-
-void pgtable_cache_init(void)
-{
-   pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
-   pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor);
-   /*
-* In all current configs, when the PUD index exists it's the
-* same size as either the pgd or pmd index except with THP enabled
-* on book3s 64
-*/
-   if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
-   pgtable_cache_add(PUD_INDEX_SIZE, pud_ctor);
-
-   if 

[PATCH V3 14/30] powerpc/mm: Move swap related definition ot hash64 header

2016-02-18 Thread Aneesh Kumar K.V
They are dependent on hash pte bits, so move them to hash64 header

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 50 
 arch/powerpc/include/asm/book3s/64/pgtable.h | 50 
 2 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index c568eaa1c26d..e88573440bbe 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -236,6 +236,56 @@
 #define pmd_index(address) (((address) >> (PMD_SHIFT)) & (PTRS_PER_PMD - 1))
 #define pte_index(address) (((address) >> (PAGE_SHIFT)) & (PTRS_PER_PTE - 1))
 
+/* Encode and de-code a swap entry */
+#define MAX_SWAPFILES_CHECK() do { \
+   BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
+   /*  \
+* Don't have overlapping bits with _PAGE_HPTEFLAGS \
+* We filter HPTEFLAGS on set_pte.  \
+*/ \
+   BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
+   BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);   \
+   } while (0)
+/*
+ * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
+ */
+#define SWP_TYPE_BITS 5
+#define __swp_type(x)  (((x).val >> _PAGE_BIT_SWAP_TYPE) \
+   & ((1UL << SWP_TYPE_BITS) - 1))
+#define __swp_offset(x)((x).val >> PTE_RPN_SHIFT)
+#define __swp_entry(type, offset)  ((swp_entry_t) { \
+   ((type) << _PAGE_BIT_SWAP_TYPE) \
+   | ((offset) << PTE_RPN_SHIFT) })
+/*
+ * swp_entry_t must be independent of pte bits. We build a swp_entry_t from
+ * swap type and offset we get from swap and convert that to pte to find a
+ * matching pte in linux page table.
+ * Clear bits not found in swap entries here.
+ */
+#define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val((pte)) & 
~_PAGE_PTE })
+#define __swp_entry_to_pte(x)  __pte((x).val | _PAGE_PTE)
+
+#ifdef CONFIG_MEM_SOFT_DIRTY
+#define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
+#else
+#define _PAGE_SWP_SOFT_DIRTY   0UL
+#endif /* CONFIG_MEM_SOFT_DIRTY */
+
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_SWP_SOFT_DIRTY);
+}
+static inline bool pte_swp_soft_dirty(pte_t pte)
+{
+   return !!(pte_val(pte) & _PAGE_SWP_SOFT_DIRTY);
+}
+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
+{
+   return __pte(pte_val(pte) & ~_PAGE_SWP_SOFT_DIRTY);
+}
+#endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
+
 extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long pte, int huge);
 extern unsigned long htab_convert_pte_flags(unsigned long pteflags);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8dafaa26f317..8840a2d205b4 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -156,56 +156,6 @@ extern struct page *pgd_page(pgd_t pgd);
 #define pgd_ERROR(e) \
pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 
-/* Encode and de-code a swap entry */
-#define MAX_SWAPFILES_CHECK() do { \
-   BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
-   /*  \
-* Don't have overlapping bits with _PAGE_HPTEFLAGS \
-* We filter HPTEFLAGS on set_pte.  \
-*/ \
-   BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
-   BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);   \
-   } while (0)
-/*
- * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
- */
-#define SWP_TYPE_BITS 5
-#define __swp_type(x)  (((x).val >> _PAGE_BIT_SWAP_TYPE) \
-   & ((1UL << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x)((x).val >> PTE_RPN_SHIFT)
-#define __swp_entry(type, offset)  ((swp_entry_t) { \
-   ((type) << _PAGE_BIT_SWAP_TYPE) \
-   | ((offset) << PTE_RPN_SHIFT) })
-/*
- * swp_entry_t must be independent of pte bits. We build a swp_entry_t from
- * swap type and offset we get from swap and convert that to pte to find a
- * matching pte in linux page table.
- * Clear bits not found in swap entries here.
- */
-#define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val((pte)) & 
~_PAGE_PTE })
-#define __swp_entry_to_pte(x)  __pte((x).val | _PAGE_PTE)
-
-#ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY   (1UL 

[PATCH V3 12/30] powerpc/mm: Use helper instead of opencoding

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f06ad7354d68..23b0dd07f9ae 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -191,7 +191,7 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
 
 static inline pgtable_t pmd_pgtable(pmd_t pmd)
 {
-   return (pgtable_t)(pmd_val(pmd) & ~PMD_MASKED_BITS);
+   return (pgtable_t)pmd_page_vaddr(pmd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 13/30] powerpc/mm: Move hash64 specific definitions to separate header

2016-02-18 Thread Aneesh Kumar K.V
We will be adding a radix variant of these routines in the followup
patches. Move the hash64 variant into its own header so that we can
rename them easily later. Also split pgalloc 64k and 4k headers

Reviewed-by: Paul Mackerras 
Signed-off-by: Aneesh Kumar K.V 
---
 .../include/asm/book3s/64/pgalloc-hash-4k.h|  92 ++
 .../include/asm/book3s/64/pgalloc-hash-64k.h   |  51 ++
 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h  |  59 ++
 arch/powerpc/include/asm/book3s/64/pgalloc.h   | 199 +
 4 files changed, 210 insertions(+), 191 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
new file mode 100644
index ..d1d67e585ad4
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
@@ -0,0 +1,92 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H
+#define _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
+   pgtable_t pte_page)
+{
+   pmd_set(pmd, (unsigned long)page_address(pte_page));
+}
+
+static inline pgtable_t pmd_pgtable(pmd_t pmd)
+{
+   return pmd_page(pmd);
+}
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+ unsigned long address)
+{
+   return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
+ unsigned long address)
+{
+   struct page *page;
+   pte_t *pte;
+
+   pte = pte_alloc_one_kernel(mm, address);
+   if (!pte)
+   return NULL;
+   page = virt_to_page(pte);
+   if (!pgtable_page_ctor(page)) {
+   __free_page(page);
+   return NULL;
+   }
+   return page;
+}
+
+static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+   free_page((unsigned long)pte);
+}
+
+static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
+{
+   pgtable_page_dtor(ptepage);
+   __free_page(ptepage);
+}
+
+static inline void pgtable_free(void *table, unsigned index_size)
+{
+   if (!index_size)
+   free_page((unsigned long)table);
+   else {
+   BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
+   kmem_cache_free(PGT_CACHE(index_size), table);
+   }
+}
+
+#ifdef CONFIG_SMP
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   unsigned long pgf = (unsigned long)table;
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   pgf |= shift;
+   tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+   pgtable_free(table, shift);
+}
+#else /* !CONFIG_SMP */
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   pgtable_free(table, shift);
+}
+#endif /* CONFIG_SMP */
+
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
+ unsigned long address)
+{
+   tlb_flush_pgtable(tlb, address);
+   pgtable_page_dtor(table);
+   pgtable_free_tlb(tlb, page_address(table), 0);
+}
+
+#endif /* _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
new file mode 100644
index ..e2dab4f64316
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
@@ -0,0 +1,51 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_64K_H
+#define _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_64K_H
+
+extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
+extern void page_table_free(struct mm_struct *, unsigned long *, int);
+extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
+#ifdef CONFIG_SMP
+extern void __tlb_remove_table(void *_table);
+#endif
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
+   pgtable_t pte_page)
+{
+   pmd_set(pmd, (unsigned long)pte_page);
+}
+
+static inline pgtable_t pmd_pgtable(pmd_t pmd)
+{
+   return (pgtable_t)pmd_page_vaddr(pmd);
+}
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+ unsigned long address)
+{
+   return (pte_t *)page_table_alloc(mm, address, 1);
+}
+
+static inline 

[PATCH V3 11/30] powerpc/mm: free_hugepd_range split to hash and nonhash

2016-02-18 Thread Aneesh Kumar K.V
We strictly don't need to do this. But enables us to not depend on
pgtable_free_tlb for radix.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hugetlbpage-book3e.c | 187 ++
 arch/powerpc/mm/hugetlbpage-hash64.c | 150 
 arch/powerpc/mm/hugetlbpage.c| 188 ---
 3 files changed, 337 insertions(+), 188 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c 
b/arch/powerpc/mm/hugetlbpage-book3e.c
index 4c43a104e35c..459d61855ff7 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -311,6 +311,193 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
return hugepte_offset(*hpdp, addr, pdshift);
 }
 
+extern void hugepd_free(struct mmu_gather *tlb, void *hugepte);
+static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int 
pdshift,
+ unsigned long start, unsigned long end,
+ unsigned long floor, unsigned long ceiling)
+{
+   pte_t *hugepte = hugepd_page(*hpdp);
+   int i;
+
+   unsigned long pdmask = ~((1UL << pdshift) - 1);
+   unsigned int num_hugepd = 1;
+
+#ifdef CONFIG_PPC_FSL_BOOK3E
+   /* Note: On fsl the hpdp may be the first of several */
+   num_hugepd = (1 << (hugepd_shift(*hpdp) - pdshift));
+#else
+   unsigned int shift = hugepd_shift(*hpdp);
+#endif
+
+   start &= pdmask;
+   if (start < floor)
+   return;
+   if (ceiling) {
+   ceiling &= pdmask;
+   if (! ceiling)
+   return;
+   }
+   if (end - 1 > ceiling - 1)
+   return;
+
+   for (i = 0; i < num_hugepd; i++, hpdp++)
+   hpdp->pd = 0;
+
+#ifdef CONFIG_PPC_FSL_BOOK3E
+   hugepd_free(tlb, hugepte);
+#else
+   pgtable_free_tlb(tlb, hugepte, pdshift - shift);
+#endif
+}
+
+static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
+  unsigned long addr, unsigned long end,
+  unsigned long floor, unsigned long ceiling)
+{
+   pmd_t *pmd;
+   unsigned long next;
+   unsigned long start;
+
+   start = addr;
+   do {
+   pmd = pmd_offset(pud, addr);
+   next = pmd_addr_end(addr, end);
+   if (!is_hugepd(__hugepd(pmd_val(*pmd {
+   /*
+* if it is not hugepd pointer, we should already find
+* it cleared.
+*/
+   WARN_ON(!pmd_none_or_clear_bad(pmd));
+   continue;
+   }
+#ifdef CONFIG_PPC_FSL_BOOK3E
+   /*
+* Increment next by the size of the huge mapping since
+* there may be more than one entry at this level for a
+* single hugepage, but all of them point to
+* the same kmem cache that holds the hugepte.
+*/
+   next = addr + (1 << hugepd_shift(*(hugepd_t *)pmd));
+#endif
+   free_hugepd_range(tlb, (hugepd_t *)pmd, PMD_SHIFT,
+ addr, next, floor, ceiling);
+   } while (addr = next, addr != end);
+
+   start &= PUD_MASK;
+   if (start < floor)
+   return;
+   if (ceiling) {
+   ceiling &= PUD_MASK;
+   if (!ceiling)
+   return;
+   }
+   if (end - 1 > ceiling - 1)
+   return;
+
+   pmd = pmd_offset(pud, start);
+   pud_clear(pud);
+   pmd_free_tlb(tlb, pmd, start);
+   mm_dec_nr_pmds(tlb->mm);
+}
+
+static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
+  unsigned long addr, unsigned long end,
+  unsigned long floor, unsigned long ceiling)
+{
+   pud_t *pud;
+   unsigned long next;
+   unsigned long start;
+
+   start = addr;
+   do {
+   pud = pud_offset(pgd, addr);
+   next = pud_addr_end(addr, end);
+   if (!is_hugepd(__hugepd(pud_val(*pud {
+   if (pud_none_or_clear_bad(pud))
+   continue;
+   hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
+  ceiling);
+   } else {
+#ifdef CONFIG_PPC_FSL_BOOK3E
+   /*
+* Increment next by the size of the huge mapping since
+* there may be more than one entry at this level for a
+* single hugepage, but all of them point to
+* the same kmem cache that holds the hugepte.
+*/
+   next = addr + (1 << hugepd_shift(*(hugepd_t *)pud));
+#endif

[PATCH V3 10/30] powerpc/mm: Hugetlbfs is book3s_64 and fsl_book3e (32 or 64)

2016-02-18 Thread Aneesh Kumar K.V
We move large part of fsl related code to hugetlbpage-book3e.c.
Only code movement. This also avoid #ifdef in the code.

Eventhough we allow hugetlbfs only for book3s 64 and fsl book3e, I am
still retaining the #ifdef in hugetlbpage-book3e.c. It looks like there
was an attempt to support hugetlbfs on other non hash platforms. I
didn't want to loose that work.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/hugetlb.h   |   1 +
 arch/powerpc/mm/hugetlbpage-book3e.c | 293 +
 arch/powerpc/mm/hugetlbpage-hash64.c | 121 +++
 arch/powerpc/mm/hugetlbpage.c| 401 +--
 4 files changed, 416 insertions(+), 400 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 7eac89b9f02e..0525f1c29afb 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -47,6 +47,7 @@ static inline unsigned int hugepd_shift(hugepd_t hpd)
 
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
+#define hugepd_none(hpd)   ((hpd).pd == 0)
 
 static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
unsigned pdshift)
diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c 
b/arch/powerpc/mm/hugetlbpage-book3e.c
index 7e6d0880813f..4c43a104e35c 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -7,6 +7,39 @@
  */
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Tracks gpages after the device tree is scanned and before the
+ * huge_boot_pages list is ready.  On non-Freescale implementations, this is
+ * just used to track 16G pages and so is a single array.  FSL-based
+ * implementations may have more than one gpage size, so we need multiple
+ * arrays
+ */
+#ifdef CONFIG_PPC_FSL_BOOK3E
+#define MAX_NUMBER_GPAGES  128
+struct psize_gpages {
+   u64 gpage_list[MAX_NUMBER_GPAGES];
+   unsigned int nr_gpages;
+};
+static struct psize_gpages gpage_freearray[MMU_PAGE_COUNT];
+#endif
+
+/*
+ * These macros define how to determine which level of the page table holds
+ * the hpdp.
+ */
+#ifdef CONFIG_PPC_FSL_BOOK3E
+#define HUGEPD_PGD_SHIFT PGDIR_SHIFT
+#define HUGEPD_PUD_SHIFT PUD_SHIFT
+#else
+#define HUGEPD_PGD_SHIFT PUD_SHIFT
+#define HUGEPD_PUD_SHIFT PMD_SHIFT
+#endif
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
 #ifdef CONFIG_PPC64
@@ -197,3 +230,263 @@ void flush_hugetlb_page(struct vm_area_struct *vma, 
unsigned long vmaddr)
 
__flush_tlb_page(vma->vm_mm, vmaddr, tsize, 0);
 }
+
+static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
+  unsigned long address, unsigned pdshift, unsigned 
pshift)
+{
+   struct kmem_cache *cachep;
+   pte_t *new;
+
+   int i;
+   int num_hugepd = 1 << (pshift - pdshift);
+   cachep = hugepte_cache;
+
+   new = kmem_cache_zalloc(cachep, GFP_KERNEL|__GFP_REPEAT);
+
+   BUG_ON(pshift > HUGEPD_SHIFT_MASK);
+   BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
+
+   if (! new)
+   return -ENOMEM;
+
+   spin_lock(>page_table_lock);
+   /*
+* We have multiple higher-level entries that point to the same
+* actual pte location.  Fill in each as we go and backtrack on error.
+* We need all of these so the DTLB pgtable walk code can find the
+* right higher-level entry without knowing if it's a hugepage or not.
+*/
+   for (i = 0; i < num_hugepd; i++, hpdp++) {
+   if (unlikely(!hugepd_none(*hpdp)))
+   break;
+   else
+   /* We use the old format for PPC_FSL_BOOK3E */
+   hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
+   }
+   /* If we bailed from the for loop early, an error occurred, clean up */
+   if (i < num_hugepd) {
+   for (i = i - 1 ; i >= 0; i--, hpdp--)
+   hpdp->pd = 0;
+   kmem_cache_free(cachep, new);
+   }
+   spin_unlock(>page_table_lock);
+   return 0;
+}
+
+pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long 
sz)
+{
+   pgd_t *pg;
+   pud_t *pu;
+   pmd_t *pm;
+   hugepd_t *hpdp = NULL;
+   unsigned pshift = __ffs(sz);
+   unsigned pdshift = PGDIR_SHIFT;
+
+   addr &= ~(sz-1);
+
+   pg = pgd_offset(mm, addr);
+
+   if (pshift >= HUGEPD_PGD_SHIFT) {
+   hpdp = (hugepd_t *)pg;
+   } else {
+   pdshift = PUD_SHIFT;
+   pu = pud_alloc(mm, pg, addr);
+   if (pshift >= HUGEPD_PUD_SHIFT) {
+   hpdp = (hugepd_t *)pu;
+   } else {
+   pdshift = PMD_SHIFT;
+   pm = pmd_alloc(mm, pu, addr);
+   hpdp = (hugepd_t *)pm;
+   }
+   }
+
+   if (!hpdp)
+   return NULL;
+
+   

[PATCH V3 09/30] powerpc/mm: Copy pgalloc (part 3)

2016-02-18 Thread Aneesh Kumar K.V
64bit book3s now always have 4 level page table irrespective of linux
page size. Move the related code out of #ifdef

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 55 +---
 1 file changed, 18 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 5bb6852fa771..f06ad7354d68 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -51,7 +51,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
 }
 
-#ifndef CONFIG_PPC_64K_PAGES
 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
 {
pgd_set(pgd, (unsigned long)pud);
@@ -79,6 +78,14 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, 
pmd_t *pmd,
pmd_set(pmd, (unsigned long)pte);
 }
 
+/*
+ * FIXME!!
+ * Between 4K and 64K pages, we differ in what is stored in pmd. ie.
+ * typedef pte_t *pgtable_t; -> 64K
+ * typedef struct page *pgtable_t; -> 4k
+ */
+#ifndef CONFIG_PPC_64K_PAGES
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
pgtable_t pte_page)
 {
@@ -176,36 +183,6 @@ extern void pgtable_free_tlb(struct mmu_gather *tlb, void 
*table, int shift);
 extern void __tlb_remove_table(void *_table);
 #endif
 
-#ifndef __PAGETABLE_PUD_FOLDED
-/* book3s 64 is 4 level page table */
-static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
-{
-   pgd_set(pgd, (unsigned long)pud);
-}
-
-static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   return kmem_cache_alloc(PGT_CACHE(PUD_INDEX_SIZE),
-   GFP_KERNEL|__GFP_REPEAT);
-}
-
-static inline void pud_free(struct mm_struct *mm, pud_t *pud)
-{
-   kmem_cache_free(PGT_CACHE(PUD_INDEX_SIZE), pud);
-}
-#endif
-
-static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
-{
-   pud_set(pud, (unsigned long)pmd);
-}
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
-  pte_t *pte)
-{
-   pmd_set(pmd, (unsigned long)pte);
-}
-
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
pgtable_t pte_page)
 {
@@ -258,13 +235,17 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
 }
 
-#define __pmd_free_tlb(tlb, pmd, addr)   \
-   pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX)
-#ifndef __PAGETABLE_PUD_FOLDED
-#define __pud_free_tlb(tlb, pud, addr)   \
-   pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
+  unsigned long address)
+{
+return pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX);
+}
 
-#endif /* __PAGETABLE_PUD_FOLDED */
+static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
+  unsigned long address)
+{
+pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE);
+}
 
 #define check_pgt_cache()  do { } while (0)
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 08/30] powerpc/mm: Copy pgalloc (part 2)

2016-02-18 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h   |  6 +++---
 arch/powerpc/include/asm/book3s/64/pgalloc.h   | 23 +++---
 arch/powerpc/include/asm/book3s/pgalloc.h  | 19 ++
 .../asm/{pgalloc-32.h => nohash/32/pgalloc.h}  |  0
 .../asm/{pgalloc-64.h => nohash/64/pgalloc.h}  |  0
 arch/powerpc/include/asm/nohash/pgalloc.h  | 23 ++
 arch/powerpc/include/asm/pgalloc.h | 19 +++---
 7 files changed, 64 insertions(+), 26 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/pgalloc.h
 rename arch/powerpc/include/asm/{pgalloc-32.h => nohash/32/pgalloc.h} (100%)
 rename arch/powerpc/include/asm/{pgalloc-64.h => nohash/64/pgalloc.h} (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/pgalloc.h

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 76d6b9e0c8a9..a2350194fc76 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGALLOC_32_H
-#define _ASM_POWERPC_PGALLOC_32_H
+#ifndef _ASM_POWERPC_BOOK3S_32_PGALLOC_H
+#define _ASM_POWERPC_BOOK3S_32_PGALLOC_H
 
 #include 
 
@@ -106,4 +106,4 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
pgtable_page_dtor(table);
pgtable_free_tlb(tlb, page_address(table), 0);
 }
-#endif /* _ASM_POWERPC_PGALLOC_32_H */
+#endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 014489a619d0..5bb6852fa771 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGALLOC_64_H
-#define _ASM_POWERPC_PGALLOC_64_H
+#ifndef _ASM_POWERPC_BOOK3S_64_PGALLOC_H
+#define _ASM_POWERPC_BOOK3S_64_PGALLOC_H
 /*
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -52,8 +52,10 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 }
 
 #ifndef CONFIG_PPC_64K_PAGES
-
-#define pgd_populate(MM, PGD, PUD) pgd_set(PGD, (unsigned long)PUD)
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+{
+   pgd_set(pgd, (unsigned long)pud);
+}
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
@@ -83,7 +85,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
pmd_set(pmd, (unsigned long)page_address(pte_page));
 }
 
-#define pmd_pgtable(pmd) pmd_page(pmd)
+static inline pgtable_t pmd_pgtable(pmd_t pmd)
+{
+   return pmd_page(pmd);
+}
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
  unsigned long address)
@@ -173,7 +178,11 @@ extern void __tlb_remove_table(void *_table);
 
 #ifndef __PAGETABLE_PUD_FOLDED
 /* book3s 64 is 4 level page table */
-#define pgd_populate(MM, PGD, PUD) pgd_set(PGD, PUD)
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+{
+   pgd_set(pgd, (unsigned long)pud);
+}
+
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
return kmem_cache_alloc(PGT_CACHE(PUD_INDEX_SIZE),
@@ -259,4 +268,4 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
 
 #define check_pgt_cache()  do { } while (0)
 
-#endif /* _ASM_POWERPC_PGALLOC_64_H */
+#endif /* _ASM_POWERPC_BOOK3S_64_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/pgalloc.h 
b/arch/powerpc/include/asm/book3s/pgalloc.h
new file mode 100644
index ..54f591e9572e
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/pgalloc.h
@@ -0,0 +1,19 @@
+#ifndef _ASM_POWERPC_BOOK3S_PGALLOC_H
+#define _ASM_POWERPC_BOOK3S_PGALLOC_H
+
+#include 
+
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+static inline void tlb_flush_pgtable(struct mmu_gather *tlb,
+unsigned long address)
+{
+
+}
+
+#ifdef CONFIG_PPC64
+#include 
+#else
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_BOOK3S_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/pgalloc-32.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
similarity index 100%
rename from arch/powerpc/include/asm/pgalloc-32.h
rename to arch/powerpc/include/asm/nohash/32/pgalloc.h
diff --git a/arch/powerpc/include/asm/pgalloc-64.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
similarity index 100%
rename from arch/powerpc/include/asm/pgalloc-64.h
rename to arch/powerpc/include/asm/nohash/64/pgalloc.h
diff --git a/arch/powerpc/include/asm/nohash/pgalloc.h 
b/arch/powerpc/include/asm/nohash/pgalloc.h
new file mode 100644
index ..b39ec956d71e
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/pgalloc.h
@@ -0,0 +1,23 @@
+#ifndef 

[PATCH V3 07/30] powerpc/mm: Copy pgalloc (part 1)

2016-02-18 Thread Aneesh Kumar K.V
This patch make a copy of pgalloc routines for book3s. The idea is to
enable a hash64 copy of these pgalloc routines which can be later
updated to have a radix conditional. Radix introduce a new page table
format with different page table size.

This mostly does:

cp pgalloc-32.h book3s/32/pgalloc.h
cp pgalloc-64.h book3s/64/pgalloc.h

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 109 +++
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 262 +++
 2 files changed, 371 insertions(+)
 create mode 100644 arch/powerpc/include/asm/book3s/32/pgalloc.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc.h

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
new file mode 100644
index ..76d6b9e0c8a9
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -0,0 +1,109 @@
+#ifndef _ASM_POWERPC_PGALLOC_32_H
+#define _ASM_POWERPC_PGALLOC_32_H
+
+#include 
+
+/* For 32-bit, all levels of page tables are just drawn from get_free_page() */
+#define MAX_PGTABLE_INDEX_SIZE 0
+
+extern void __bad_pte(pmd_t *pmd);
+
+extern pgd_t *pgd_alloc(struct mm_struct *mm);
+extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
+
+/*
+ * We don't have any real pmd's, and this code never triggers because
+ * the pgd will always be present..
+ */
+/* #define pmd_alloc_one(mm,address)   ({ BUG(); ((pmd_t *)2); }) */
+#define pmd_free(mm, x)do { } while (0)
+#define __pmd_free_tlb(tlb,x,a)do { } while (0)
+/* #define pgd_populate(mm, pmd, pte)  BUG() */
+
+#ifndef CONFIG_BOOKE
+
+static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
+  pte_t *pte)
+{
+   *pmdp = __pmd(__pa(pte) | _PMD_PRESENT);
+}
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
+   pgtable_t pte_page)
+{
+   *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_PRESENT);
+}
+
+#define pmd_pgtable(pmd) pmd_page(pmd)
+#else
+
+static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
+  pte_t *pte)
+{
+   *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
+}
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
+   pgtable_t pte_page)
+{
+   *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | 
_PMD_PRESENT);
+}
+
+#define pmd_pgtable(pmd) pmd_page(pmd)
+#endif
+
+extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
+extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
+
+static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+   free_page((unsigned long)pte);
+}
+
+static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
+{
+   pgtable_page_dtor(ptepage);
+   __free_page(ptepage);
+}
+
+static inline void pgtable_free(void *table, unsigned index_size)
+{
+   BUG_ON(index_size); /* 32-bit doesn't use this */
+   free_page((unsigned long)table);
+}
+
+#define check_pgt_cache()  do { } while (0)
+
+#ifdef CONFIG_SMP
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   unsigned long pgf = (unsigned long)table;
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   pgf |= shift;
+   tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+   pgtable_free(table, shift);
+}
+#else
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   pgtable_free(table, shift);
+}
+#endif
+
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
+ unsigned long address)
+{
+   tlb_flush_pgtable(tlb, address);
+   pgtable_page_dtor(table);
+   pgtable_free_tlb(tlb, page_address(table), 0);
+}
+#endif /* _ASM_POWERPC_PGALLOC_32_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
new file mode 100644
index ..014489a619d0
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -0,0 +1,262 @@
+#ifndef _ASM_POWERPC_PGALLOC_64_H
+#define _ASM_POWERPC_PGALLOC_64_H
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+struct vmemmap_backing {
+   struct vmemmap_backing *list;
+   unsigned long phys;
+

[PATCH V3 06/30] powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table

2016-02-18 Thread Aneesh Kumar K.V
This is needed so that we can support both hash and radix page table
using single kernel. Radix kernel uses a 4 level table.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 33 +--
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 20 +---
 arch/powerpc/include/asm/book3s/64/hash.h |  8 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 25 +++-
 arch/powerpc/include/asm/pgalloc-64.h | 24 ---
 arch/powerpc/include/asm/pgtable-types.h  | 13 +++
 arch/powerpc/mm/init_64.c | 21 -
 8 files changed, 90 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9faa18c4f3f7..599329332613 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -303,6 +303,7 @@ config ZONE_DMA32
 config PGTABLE_LEVELS
int
default 2 if !PPC64
+   default 4 if PPC_BOOK3S_64
default 3 if PPC_64K_PAGES
default 4
 
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index ea0414d6659e..c78f5928001b 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -57,39 +57,8 @@
 #define _PAGE_4K_PFN   0
 #ifndef __ASSEMBLY__
 /*
- * 4-level page tables related bits
+ * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range()
  */
-
-#define pgd_none(pgd)  (!pgd_val(pgd))
-#define pgd_bad(pgd)   (pgd_val(pgd) == 0)
-#define pgd_present(pgd)   (pgd_val(pgd) != 0)
-#define pgd_page_vaddr(pgd)(pgd_val(pgd) & ~PGD_MASKED_BITS)
-
-static inline void pgd_clear(pgd_t *pgdp)
-{
-   *pgdp = __pgd(0);
-}
-
-static inline pte_t pgd_pte(pgd_t pgd)
-{
-   return __pte(pgd_val(pgd));
-}
-
-static inline pgd_t pte_pgd(pte_t pte)
-{
-   return __pgd(pte_val(pte));
-}
-extern struct page *pgd_page(pgd_t pgd);
-
-#define pud_offset(pgdp, addr) \
-  (((pud_t *) pgd_page_vaddr(*(pgdp))) + \
-(((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)))
-
-#define pud_ERROR(e) \
-   pr_err("%s:%d: bad pud %08lx.\n", __FILE__, __LINE__, pud_val(e))
-
-/*
- * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range() */
 #define remap_4k_pfn(vma, addr, pfn, prot) \
remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, (prot))
 
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 849bbec80f7b..5c9392b71a6b 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -1,15 +1,14 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_HASH_64K_H
 #define _ASM_POWERPC_BOOK3S_64_HASH_64K_H
 
-#include 
-
 #define PTE_INDEX_SIZE  8
-#define PMD_INDEX_SIZE  10
-#define PUD_INDEX_SIZE 0
+#define PMD_INDEX_SIZE  5
+#define PUD_INDEX_SIZE 5
 #define PGD_INDEX_SIZE  12
 
 #define PTRS_PER_PTE   (1 << PTE_INDEX_SIZE)
 #define PTRS_PER_PMD   (1 << PMD_INDEX_SIZE)
+#define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
 #define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
 
 /* With 4k base page size, hugepage PTEs go at the PMD level */
@@ -20,8 +19,13 @@
 #define PMD_SIZE   (1UL << PMD_SHIFT)
 #define PMD_MASK   (~(PMD_SIZE-1))
 
+/* PUD_SHIFT determines what a third-level page table entry can map */
+#define PUD_SHIFT  (PMD_SHIFT + PMD_INDEX_SIZE)
+#define PUD_SIZE   (1UL << PUD_SHIFT)
+#define PUD_MASK   (~(PUD_SIZE-1))
+
 /* PGDIR_SHIFT determines what a third-level page table entry can map */
-#define PGDIR_SHIFT(PMD_SHIFT + PMD_INDEX_SIZE)
+#define PGDIR_SHIFT(PUD_SHIFT + PUD_INDEX_SIZE)
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
@@ -61,6 +65,8 @@
 #define PMD_MASKED_BITS(PTE_FRAG_SIZE - 1)
 /* Bits to mask out from a PGD/PUD to get to the PMD page */
 #define PUD_MASKED_BITS0x1ff
+/* FIXME!! check this */
+#define PGD_MASKED_BITS0
 
 #ifndef __ASSEMBLY__
 
@@ -130,11 +136,9 @@ extern bool __rpte_sub_valid(real_pte_t rpte, unsigned 
long index);
 #else
 #define PMD_TABLE_SIZE (sizeof(pmd_t) << PMD_INDEX_SIZE)
 #endif
+#define PUD_TABLE_SIZE (sizeof(pud_t) << PUD_INDEX_SIZE)
 #define PGD_TABLE_SIZE (sizeof(pgd_t) << PGD_INDEX_SIZE)
 
-#define pgd_pte(pgd)   (pud_pte(((pud_t){ pgd })))
-#define pte_pgd(pte)   ((pgd_t)pte_pud(pte))
-
 #ifdef CONFIG_HUGETLB_PAGE
 /*
  * We have PGD_INDEX_SIZ = 12 and PTE_INDEX_SIZE = 8, so that we can have
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 650f7e7b5410..c568eaa1c26d 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -232,6 +232,7 @@
 #define pud_page_vaddr(pud)(pud_val(pud) & ~PUD_MASKED_BITS)
 
 #define pgd_index(address) (((address) 

[PATCH V3 05/30] powerpc/mm: Don't have conditional defines for real_pte_t

2016-02-18 Thread Aneesh Kumar K.V
We remove real_pte_t out of STRICT_MM_TYPESCHECK.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  5 -
 arch/powerpc/include/asm/pgtable-types.h | 26 +-
 2 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ac07a30a7934..bffb2872342b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -43,13 +43,8 @@
  */
 #ifndef __real_pte
 
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
 #define __real_pte(e,p)((real_pte_t){(e)})
 #define __rpte_to_pte(r)   ((r).pte)
-#else
-#define __real_pte(e,p)(e)
-#define __rpte_to_pte(r)   (__pte(r))
-#endif
 #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) 
>>_PAGE_F_GIX_SHIFT)
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)   \
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
index 2fac0c4acfa4..71487e1ca638 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -12,15 +12,6 @@ static inline pte_basic_t pte_val(pte_t x)
return x.pte;
 }
 
-/* 64k pages additionally define a bigger "real PTE" type that gathers
- * the "second half" part of the PTE for pseudo 64k pages
- */
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef struct { pte_t pte; } real_pte_t;
-#endif
-
 /* PMD level */
 #ifdef CONFIG_PPC64
 typedef struct { unsigned long pmd; } pmd_t;
@@ -67,13 +58,6 @@ static inline pte_basic_t pte_val(pte_t pte)
return pte;
 }
 
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef pte_t real_pte_t;
-#endif
-
-
 #ifdef CONFIG_PPC64
 typedef unsigned long pmd_t;
 #define __pmd(x)   (x)
@@ -103,6 +87,14 @@ typedef unsigned long pgprot_t;
 #define pgprot_val(x)  (x)
 #define __pgprot(x)(x)
 
+#endif /* CONFIG_STRICT_MM_TYPECHECKS */
+/*
+ * With hash config 64k pages additionally define a bigger "real PTE" type that
+ * gathers the "second half" part of the PTE for pseudo 64k pages
+ */
+#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
+typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
+#else
+typedef struct { pte_t pte; } real_pte_t;
 #endif
-
 #endif /* _ASM_POWERPC_PGTABLE_TYPES_H */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 04/30] powerpc/mm: Split pgtable types to separate header

2016-02-18 Thread Aneesh Kumar K.V
We move the page table accessors into a separate header. We will
later add a big endian variant of the table which is needed for radix.
No functionality change only code movement.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/page.h  | 104 +
 arch/powerpc/include/asm/pgtable-types.h | 108 +++
 2 files changed, 109 insertions(+), 103 deletions(-)
 create mode 100644 arch/powerpc/include/asm/pgtable-types.h

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index e34124f6fbf2..3a3f073f7222 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -281,109 +281,7 @@ extern long long virt_phys_offset;
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
-/* These are used to make use of C type-checking. */
-
-/* PTE level */
-typedef struct { pte_basic_t pte; } pte_t;
-#define __pte(x)   ((pte_t) { (x) })
-static inline pte_basic_t pte_val(pte_t x)
-{
-   return x.pte;
-}
-
-/* 64k pages additionally define a bigger "real PTE" type that gathers
- * the "second half" part of the PTE for pseudo 64k pages
- */
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef struct { pte_t pte; } real_pte_t;
-#endif
-
-/* PMD level */
-#ifdef CONFIG_PPC64
-typedef struct { unsigned long pmd; } pmd_t;
-#define __pmd(x)   ((pmd_t) { (x) })
-static inline unsigned long pmd_val(pmd_t x)
-{
-   return x.pmd;
-}
-
-/* PUD level exusts only on 4k pages */
-#ifndef CONFIG_PPC_64K_PAGES
-typedef struct { unsigned long pud; } pud_t;
-#define __pud(x)   ((pud_t) { (x) })
-static inline unsigned long pud_val(pud_t x)
-{
-   return x.pud;
-}
-#endif /* !CONFIG_PPC_64K_PAGES */
-#endif /* CONFIG_PPC64 */
-
-/* PGD level */
-typedef struct { unsigned long pgd; } pgd_t;
-#define __pgd(x)   ((pgd_t) { (x) })
-static inline unsigned long pgd_val(pgd_t x)
-{
-   return x.pgd;
-}
-
-/* Page protection bits */
-typedef struct { unsigned long pgprot; } pgprot_t;
-#define pgprot_val(x)  ((x).pgprot)
-#define __pgprot(x)((pgprot_t) { (x) })
-
-#else
-
-/*
- * .. while these make it easier on the compiler
- */
-
-typedef pte_basic_t pte_t;
-#define __pte(x)   (x)
-static inline pte_basic_t pte_val(pte_t pte)
-{
-   return pte;
-}
-
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef pte_t real_pte_t;
-#endif
-
-
-#ifdef CONFIG_PPC64
-typedef unsigned long pmd_t;
-#define __pmd(x)   (x)
-static inline unsigned long pmd_val(pmd_t pmd)
-{
-   return pmd;
-}
-
-#ifndef CONFIG_PPC_64K_PAGES
-typedef unsigned long pud_t;
-#define __pud(x)   (x)
-static inline unsigned long pud_val(pud_t pud)
-{
-   return pud;
-}
-#endif /* !CONFIG_PPC_64K_PAGES */
-#endif /* CONFIG_PPC64 */
-
-typedef unsigned long pgd_t;
-#define __pgd(x)   (x)
-static inline unsigned long pgd_val(pgd_t pgd)
-{
-   return pgd;
-}
-
-typedef unsigned long pgprot_t;
-#define pgprot_val(x)  (x)
-#define __pgprot(x)(x)
-
-#endif
+#include 
 
 typedef struct { signed long pd; } hugepd_t;
 
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
new file mode 100644
index ..2fac0c4acfa4
--- /dev/null
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -0,0 +1,108 @@
+#ifndef _ASM_POWERPC_PGTABLE_TYPES_H
+#define _ASM_POWERPC_PGTABLE_TYPES_H
+
+#ifdef CONFIG_STRICT_MM_TYPECHECKS
+/* These are used to make use of C type-checking. */
+
+/* PTE level */
+typedef struct { pte_basic_t pte; } pte_t;
+#define __pte(x)   ((pte_t) { (x) })
+static inline pte_basic_t pte_val(pte_t x)
+{
+   return x.pte;
+}
+
+/* 64k pages additionally define a bigger "real PTE" type that gathers
+ * the "second half" part of the PTE for pseudo 64k pages
+ */
+#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
+typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
+#else
+typedef struct { pte_t pte; } real_pte_t;
+#endif
+
+/* PMD level */
+#ifdef CONFIG_PPC64
+typedef struct { unsigned long pmd; } pmd_t;
+#define __pmd(x)   ((pmd_t) { (x) })
+static inline unsigned long pmd_val(pmd_t x)
+{
+   return x.pmd;
+}
+
+/* PUD level exusts only on 4k pages */
+#ifndef CONFIG_PPC_64K_PAGES
+typedef struct { unsigned long pud; } pud_t;
+#define __pud(x)   ((pud_t) { (x) })
+static inline unsigned long pud_val(pud_t x)
+{
+   return x.pud;
+}
+#endif /* !CONFIG_PPC_64K_PAGES */
+#endif /* CONFIG_PPC64 */
+
+/* PGD level */
+typedef struct { unsigned long pgd; } pgd_t;
+#define __pgd(x)   ((pgd_t) { (x) })
+static inline unsigned long pgd_val(pgd_t x)
+{
+   return x.pgd;
+}
+
+/* Page protection bits */
+typedef struct { unsigned long pgprot; } pgprot_t;
+#define pgprot_val(x)  

[PATCH V3 03/30] powerpc/mm: add _PAGE_HASHPTE similar to 4K hash

2016-02-18 Thread Aneesh Kumar K.V
The difference between 64K and 4K hash fault handling is confusing
with respect to when we set _PAGE_HASHPTE in the linux pte.
I was trying to find out whether we miss a hpte flush in any
scenario because of this. ie, a pte update on a linux pte, for which we
are doing a parallel hash pte insert. After looking at it closer my
understanding is this won't happen because pte update also look at
_PAGE_BUSY and we will wait for hash pte insert to finish before going
ahead with the pte update. But to avoid further confusion keep the
hash fault handler for all the page size similar to  __hash_page_4k.

This partially reverts commit 41743a4e34f0 ("powerpc: Free a PTE bit on ppc64 
with 64K pages"

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash64_64k.c | 4 ++--
 arch/powerpc/mm/hugepage-hash64.c| 2 +-
 arch/powerpc/mm/hugetlbpage-hash64.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index b3895720edb0..ac589947c882 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -76,7 +76,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * a write access. Since this is 4K insert of 64K page size
 * also add _PAGE_COMBO
 */
-   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_COMBO;
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_COMBO | 
_PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pte |= _PAGE_DIRTY;
} while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
@@ -252,7 +252,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 * a write access. Since this is 4K insert of 64K page size
 * also add _PAGE_COMBO
 */
-   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pte |= _PAGE_DIRTY;
} while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index 8424f46c2bf7..bfde5aebb13d 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -46,7 +46,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * Try to lock the PTE, add ACCESSED and DIRTY if it was
 * a write access
 */
-   new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED;
+   new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pmd |= _PAGE_DIRTY;
} while (old_pmd != __cmpxchg_u64((unsigned long *)pmdp,
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index e2138c7ae70f..9c224b012d62 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -54,7 +54,7 @@ int __hash_page_huge(unsigned long ea, unsigned long access, 
unsigned long vsid,
return 1;
/* Try to lock the PTE, add ACCESSED and DIRTY if it was
 * a write access */
-   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pte |= _PAGE_DIRTY;
} while(old_pte != __cmpxchg_u64((unsigned long *)ptep,
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 02/30] mm: Some arch may want to use HPAGE_PMD related values as variables

2016-02-18 Thread Aneesh Kumar K.V
From: "Kirill A. Shutemov" 

With next generation power processor, we are having a new mmu model
[1] that require us to maintain a different linux page table format.

Inorder to support both current and future ppc64 systems with a single
kernel we need to make sure kernel can select between different page
table format at runtime. With the new MMU (radix MMU) added, we will
have two different pmd hugepage size 16MB for hash model and 2MB for
Radix model. Hence make HPAGE_PMD related values as a variable.

Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a
followup patch.

[1] http://ibm.biz/power-isa3 (Needs registration).

Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable_64.c |  7 +++
 include/linux/bug.h  |  9 +
 include/linux/huge_mm.h  |  3 ---
 mm/huge_memory.c | 17 ++---
 4 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 03f6e72697d0..2ee7bd108f59 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -818,6 +818,13 @@ pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 
 int has_transparent_hugepage(void)
 {
+
+   BUILD_BUG_ON_MSG((PMD_SHIFT - PAGE_SHIFT) >= MAX_ORDER,
+   "hugepages can't be allocated by the buddy allocator");
+
+   BUILD_BUG_ON_MSG((PMD_SHIFT - PAGE_SHIFT) < 2,
+"We need more than 2 pages to do deferred thp split");
+
if (!mmu_has_feature(MMU_FTR_16M_PAGE))
return 0;
/*
diff --git a/include/linux/bug.h b/include/linux/bug.h
index 7f4818673c41..e51b0709e78d 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -20,6 +20,7 @@ struct pt_regs;
 #define BUILD_BUG_ON_MSG(cond, msg) (0)
 #define BUILD_BUG_ON(condition) (0)
 #define BUILD_BUG() (0)
+#define MAYBE_BUILD_BUG_ON(cond) (0)
 #else /* __CHECKER__ */
 
 /* Force a compilation error if a constant expression is not a power of 2 */
@@ -83,6 +84,14 @@ struct pt_regs;
  */
 #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
 
+#define MAYBE_BUILD_BUG_ON(cond)   \
+   do {\
+   if (__builtin_constant_p((cond)))   \
+   BUILD_BUG_ON(cond); \
+   else\
+   BUG_ON(cond);   \
+   } while (0)
+
 #endif /* __CHECKER__ */
 
 #ifdef CONFIG_GENERIC_BUG
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 459fd25b378e..f12513a20a06 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -111,9 +111,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
__split_huge_pmd(__vma, __pmd, __address);  \
}  while (0)
 
-#if HPAGE_PMD_ORDER >= MAX_ORDER
-#error "hugepages can't be allocated by the buddy allocator"
-#endif
 extern int hugepage_madvise(struct vm_area_struct *vma,
unsigned long *vm_flags, int advice);
 extern void vma_adjust_trans_huge(struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index aea8f7a42df9..36c22a89df61 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -83,7 +83,7 @@ unsigned long transparent_hugepage_flags __read_mostly =
(1<= MAX_ORDER);
+   /*
+* we use page->mapping and page->index in second tail page
+* as list_head: assuming THP order >= 2
+*/
+   MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER < 2);
+
err = hugepage_init_sysfs(_kobj);
if (err)
goto err_sysfs;
@@ -764,7 +776,6 @@ void prep_transhuge_page(struct page *page)
 

[PATCH V3 01/30] mm: Make vm_get_page_prot arch specific.

2016-02-18 Thread Aneesh Kumar K.V
With next generation power processor, we are having a new mmu model
[1] that require us to maintain a different linux page table format.

Inorder to support both current and future ppc64 systems with a single
kernel we need to make sure kernel can select between different page
table format at runtime. With the new MMU (radix MMU) added, we will
have to dynamically switch between different protection map. Hence
override vm_get_page_prot instead of using arch_vm_get_page_prot. We
also drop arch_vm_get_page_prot since only powerpc used it.

[1] http://ibm.biz/power-isa3 (Needs registration).

Acked-by: Kirill A. Shutemov 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h |  3 +++
 arch/powerpc/include/asm/mman.h   |  6 --
 arch/powerpc/mm/hash_utils_64.c   | 19 +++
 include/linux/mman.h  |  4 
 mm/mmap.c |  9 ++---
 5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 8d1c8162f0c1..650f7e7b5410 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -530,6 +530,9 @@ static inline pgprot_t pgprot_writecombine(pgprot_t prot)
return pgprot_noncached_wc(prot);
 }
 
+extern pgprot_t vm_get_page_prot(unsigned long vm_flags);
+#define vm_get_page_prot vm_get_page_prot
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
   pmd_t *pmdp, unsigned long old_pmd);
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 8565c254151a..9f48698af024 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -24,12 +24,6 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned 
long prot)
 }
 #define arch_calc_vm_prot_bits(prot) arch_calc_vm_prot_bits(prot)
 
-static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
-{
-   return (vm_flags & VM_SAO) ? __pgprot(_PAGE_SAO) : __pgprot(0);
-}
-#define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
-
 static inline int arch_validate_prot(unsigned long prot)
 {
if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index ba59d5977f34..3199bbc654c5 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1564,3 +1564,22 @@ void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
/* Finally limit subsequent allocations */
memblock_set_current_limit(ppc64_rma_size);
 }
+
+static pgprot_t hash_protection_map[16] = {
+   __P000, __P001, __P010, __P011, __P100,
+   __P101, __P110, __P111, __S000, __S001,
+   __S010, __S011, __S100, __S101, __S110, __S111
+};
+
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   pgprot_t prot_soa = __pgprot(0);
+
+   if (vm_flags & VM_SAO)
+   prot_soa = __pgprot(_PAGE_SAO);
+
+   return __pgprot(pgprot_val(hash_protection_map[vm_flags &
+   (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
+   pgprot_val(prot_soa));
+}
+EXPORT_SYMBOL(vm_get_page_prot);
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 16373c8f5f57..d44abea464e2 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -38,10 +38,6 @@ static inline void vm_unacct_memory(long pages)
 #define arch_calc_vm_prot_bits(prot) 0
 #endif
 
-#ifndef arch_vm_get_page_prot
-#define arch_vm_get_page_prot(vm_flags) __pgprot(0)
-#endif
-
 #ifndef arch_validate_prot
 /*
  * This is called from mprotect().  PROT_GROWSDOWN and PROT_GROWSUP have
diff --git a/mm/mmap.c b/mm/mmap.c
index 2f2415a7a688..69cfacc94f9b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -92,6 +92,10 @@ static void unmap_region(struct mm_struct *mm,
  * x: (no) no  x: (no) yes x: (no) yes x: (yes) yes
  *
  */
+/*
+ * Give arch an option to override the below in dynamic matter
+ */
+#ifndef vm_get_page_prot
 pgprot_t protection_map[16] = {
__P000, __P001, __P010, __P011, __P100, __P101, __P110, __P111,
__S000, __S001, __S010, __S011, __S100, __S101, __S110, __S111
@@ -99,11 +103,10 @@ pgprot_t protection_map[16] = {
 
 pgprot_t vm_get_page_prot(unsigned long vm_flags)
 {
-   return __pgprot(pgprot_val(protection_map[vm_flags &
-   (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
-   pgprot_val(arch_vm_get_page_prot(vm_flags)));
+   return protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)];
 }
 EXPORT_SYMBOL(vm_get_page_prot);
+#endif
 
 static pgprot_t vm_pgprot_modify(pgprot_t oldprot, unsigned long vm_flags)
 {
-- 
2.5.0


[PATCH V3 00/30] Book3s abstraction in preparation for new MMU model

2016-02-18 Thread Aneesh Kumar K.V
Hello,

This is a large series, mostly consisting of code movement. No new features
are done in this series. The changes are done to accomodate the upcoming new 
memory
model in future powerpc chips. The details of the new MMU model can be found at

 http://ibm.biz/power-isa3 (Needs registration). I am including a summary of 
the changes below.

ISA 3.0 adds support for the radix tree style of MMU with full
virtualization and related control mechanisms that manage its
coexistence with the HPT. Radix-using operating systems will
manage their own translation tables instead of relying on hcalls.

Radix style MMU model requires us to do a 4 level page table
with 64K and 4K page size. The table index size different page size
is listed below

PGD -> 13 bits
PUD -> 9 (1G hugepage)
PMD -> 9 (2M huge page)
PTE -> 5 (for 64k), 9 (for 4k)

We also require the page table to be in big endian format.

The changes proposed in this series enables us to support both
hash page table and radix tree style MMU using a single kernel
with limited impact. The idea is to change core page table
accessors to static inline functions and later hotpatch them
to switch to hash or radix tree functions. For ex:

static inline int pte_write(pte_t pte)
{
   if (radix_enabled())
   return rpte_write(pte);
return hlpte_write(pte);
}

On boot we will hotpatch the code so as to avoid conditional operation.

The other two major change propsed in this series is to switch hash
linux page table to a 4 level table in big endian format. This is
done so that functions like pte_val(), pud_populate() doesn't need
hotpatching and thereby helps in limiting runtime impact of the changes.

I didn't included the radix related changes in this series. You can
find them at https://github.com/kvaneesh/linux/commits/radix-mmu-v2

Changes from V2:
 * rebase to latest kernel
 * Update commit messages
 * address review comments

Changes from V1:
* move patches adding helpers to the next series

-aneesh


Aneesh Kumar K.V (29):
  mm: Make vm_get_page_prot arch specific.
  powerpc/mm: add _PAGE_HASHPTE similar to 4K hash
  powerpc/mm: Split pgtable types to separate header
  powerpc/mm: Don't have conditional defines for real_pte_t
  powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table
  powerpc/mm: Copy pgalloc (part 1)
  powerpc/mm: Copy pgalloc (part 2)
  powerpc/mm: Copy pgalloc (part 3)
  powerpc/mm: Hugetlbfs is book3s_64 and fsl_book3e (32 or 64)
  powerpc/mm: free_hugepd_range split to hash and nonhash
  powerpc/mm: Use helper instead of opencoding
  powerpc/mm: Move hash64 specific definitions to separate header
  powerpc/mm: Move swap related definition ot hash64 header
  powerpc/mm: Move hash page table related functions to pgtable-hash64.c
  powerpc/mm: Rename hash specific page table bits (_PAGE* -> H_PAGE*)
  powerpc/mm: Use flush_tlb_page in ptep_clear_flush_young
  powerpc/mm: THP is only available on hash64 as of now
  powerpc/mm: Use generic version of pmdp_clear_flush_young
  powerpc/mm: Create a new headers for tlbflush for hash64
  powerpc/mm: Hash linux abstraction for page table accessors
  powerpc/mm: Hash linux abstraction for functions in pgtable-hash.c
  powerpc/mm: Hash linux abstraction for mmu context handling code
  powerpc/mm: Move hash related mmu-*.h headers to book3s/
  powerpc/mm: Hash linux abstractions for early init routines
  powerpc/mm: Hash linux abstraction for THP
  powerpc/mm: Hash linux abstraction for HugeTLB
  powerpc/mm: Hash linux abstraction for page table allocator
  powerpc/mm: Hash linux abstraction for tlbflush routines
  powerpc/mm: Hash linux abstraction for pte swap encoding

Kirill A. Shutemov (1):
  mm: Some arch may want to use HPAGE_PMD related values as variables

 arch/powerpc/Kconfig   |   1 +
 .../asm/{mmu-hash32.h => book3s/32/mmu-hash.h} |   6 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h   | 109 
 arch/powerpc/include/asm/book3s/32/pgtable.h   |  13 +
 arch/powerpc/include/asm/book3s/64/hash-4k.h   | 103 ++-
 arch/powerpc/include/asm/book3s/64/hash-64k.h  | 165 ++---
 arch/powerpc/include/asm/book3s/64/hash.h  | 525 ---
 .../asm/{mmu-hash64.h => book3s/64/mmu-hash.h} |  67 +-
 arch/powerpc/include/asm/book3s/64/mmu.h   |  92 +++
 .../include/asm/book3s/64/pgalloc-hash-4k.h|  92 +++
 .../include/asm/book3s/64/pgalloc-hash-64k.h   |  48 ++
 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h  |  82 +++
 arch/powerpc/include/asm/book3s/64/pgalloc.h   | 158 +
 arch/powerpc/include/asm/book3s/64/pgtable.h   | 713 ++---
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h |  96 +++
 arch/powerpc/include/asm/book3s/64/tlbflush.h  |  56 ++
 arch/powerpc/include/asm/book3s/pgalloc.h  |  19 +
 arch/powerpc/include/asm/book3s/pgtable.h  |   4 -
 arch/powerpc/include/asm/hugetlb.h |   5 +-
 

Re: [PATCH] add POWER Virtual Management Channel driver

2016-02-18 Thread Steven Royer

On 2016-02-17 23:30, Stewart Smith wrote:

Steven Royer  writes:

On 2016-02-17 16:31, Greg Kroah-Hartman wrote:

On Wed, Feb 17, 2016 at 03:18:26PM -0600, Steven Royer wrote:

On 2016-02-16 16:18, Greg Kroah-Hartman wrote:
>On Tue, Feb 16, 2016 at 02:43:13PM -0600, Steven Royer wrote:
>>From: Steven Royer 
>>
>>The ibmvmc driver is a device driver for the POWER Virtual Management
>>Channel virtual adapter on the PowerVM platform.  It is used to
>>communicate with the hypervisor for virtualization management.  It
>>provides both request/response and asynchronous message support through
>>the /dev/ibmvmc node.
>
>What is the protocol for that device node?
The protocol is not currently published.  I am pushing on getting it
published, but that process will take time.  If you have a PowerVM
system
with NovaLink, it would not be hard to reverse engineer it...  If 
you

don't
have a PowerVM system, then this driver isn't interesting anyway...


Stephen - if you need some help pushing for it to be published, let me
know, there's a few internal things I could help push.

Thanks



You can't just expect us to review this code without at least having 
a

clue as to how it is supposed to work?
There are two layers to the protocol.  The first layer is the only 
layer

that the driver actually cares about.  The second layer is just a
payload that is between the application and the hypervisor and can
change independently from the kernel/driver (this is what is 
transported
over the /dev/ibmvmc node).  The first layer technically is published 
in

the PAPR (appendix G), but it is not trivial for most people to access


https://members.openpowerfoundation.org/document/dl/469 is LoPAPR which
has been published through OpenPower Foundation and anyone can access,
although Appendix G there is on EEH. Although VMC (Virtual Management
Channel) is mentioned in that document the details aren't there... so
it's possible that this is only in some other PAPR version :/
and... looking in internal places, it is. *sigh*

With my OpenPower Foundation hat on, I'll say that it's a
work-in-progress getting all this documentation in order.

The questions of if it's a sensible hypervisor to partition interface
and if it's a sensible userspace API are open for debate :)

Would we implement this way of communicating between a KVM guest and 
the

host linux system? If not, then it's probably not a generally good
idea. That being said, it seems to be what already exists in PowerVM


There is no "host" OS on PowerVM.  The ibmvmc device makes it possible 
to emulate that behavior by picking one of the LPARs to be privileged.  
So this isn't really similar to a KVM guest talking to the KVM host.  
It's more like this Linux OS becomes the host.  ibmvmc is the pipe that 
enables virtualization management software (i.e., OpenStack via 
NovaLink) to manage PowerVM: create/destroy/modify guests, etc...  The 
why's and how's of NovaLink are described simply here: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/Introducing%20PowerVM%20NovaLink


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/mm/hash: Clear the invalid slot information correctly

2016-02-18 Thread Aneesh Kumar K.V
We can get a hash pte fault with 4k base page size and find the pte
already inserted with 64K base page size. In that case we need to clear
the existing slot information from the old pte. Fix this correctly

With THP, we also clear the slot information with respect to all
the 64K hash pte mapping that 16MB page. They are all invalid
now. This make sure we don't find the slot valid when we fault with
4k base page size. Finding the slot valid should not result in any wrong
behavior because we do check again in hash page table for the validity.
But we can avoid that check completely.

Fixes: a43c0eb8364c022 ("powerpc/mm: Convert 4k hash insert to C")

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash64_4k.c   |  2 +-
 arch/powerpc/mm/hash64_64k.c  | 12 +---
 arch/powerpc/mm/hugepage-hash64.c |  7 ++-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index e7c04542ba62..e3e76b929f33 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -106,7 +106,7 @@ repeat:
}
}
/*
-* Hypervisor failure. Restore old pmd and return -1
+* Hypervisor failure. Restore old pte and return -1
 * similar to __hash_page_*
 */
if (unlikely(slot == -2)) {
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 0762c1e08c88..b3895720edb0 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -111,7 +111,13 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 */
if (!(old_pte & _PAGE_COMBO)) {
flush_hash_page(vpn, rpte, MMU_PAGE_64K, ssize, flags);
-   old_pte &= ~_PAGE_HASHPTE | _PAGE_F_GIX | _PAGE_F_SECOND;
+   /*
+* clear the old slot details from the old and new pte.
+* On hash insert failure we use old pte value and we don't
+* want slot information there if we have a insert failure.
+*/
+   old_pte &= ~(_PAGE_HASHPTE | _PAGE_F_GIX | _PAGE_F_SECOND);
+   new_pte &= ~(_PAGE_HASHPTE | _PAGE_F_GIX | _PAGE_F_SECOND);
goto htab_insert_hpte;
}
/*
@@ -182,7 +188,7 @@ repeat:
}
}
/*
-* Hypervisor failure. Restore old pmd and return -1
+* Hypervisor failure. Restore old pte and return -1
 * similar to __hash_page_*
 */
if (unlikely(slot == -2)) {
@@ -305,7 +311,7 @@ repeat:
}
}
/*
-* Hypervisor failure. Restore old pmd and return -1
+* Hypervisor failure. Restore old pte and return -1
 * similar to __hash_page_*
 */
if (unlikely(slot == -2)) {
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index 49b152b0f926..8424f46c2bf7 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -78,9 +78,14 @@ int __hash_page_thp(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * base page size. This is because demote_segment won't flush
 * hash page table entries.
 */
-   if ((old_pmd & _PAGE_HASHPTE) && !(old_pmd & _PAGE_COMBO))
+   if ((old_pmd & _PAGE_HASHPTE) && !(old_pmd & _PAGE_COMBO)) {
flush_hash_hugepage(vsid, ea, pmdp, MMU_PAGE_64K,
ssize, flags);
+   /*
+* clear the old slot information 
+*/
+   memset(hpte_slot_array, 0, PTE_FRAG_SIZE);
+   }
}
 
valid = hpte_valid(hpte_slot_array, index);
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-18 Thread Gerald Schaefer
On Thu, 18 Feb 2016 01:58:08 +0200
"Kirill A. Shutemov"  wrote:

> On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > Sebastian Ott  wrote:
> > 
> > > [   59.875935] [ cut here ]
> > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP 
> > > DEBUG_PAGEALLOC
> > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan 
> > > ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core 
> > > ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic 
> > > sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card 
> > > scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan 
> > > kvm autofs4
> > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: GW   
> > > 4.4.0-07794-ga4eff16-dirty #77
> > > [   59.876036] task: d2312948 ti: cfecc000 task.ti: 
> > > cfecc000
> > > [   59.876039] Krnl PSW : 0704d0018000 002bf3aa 
> > > (__split_huge_pmd_locked+0x562/0xa10)
> > > [   59.876045]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 
> > > PM:0 EA:3
> > >Krnl GPRS: 01a7a1cf 03d10177c000 
> > > 00044068 5df00215
> > > [   59.876051]0001 0001 
> > >  774e6900
> > > [   59.876054]03ff5200 6d403b10 
> > > 6e1eb800 03ff51f0
> > > [   59.876058]03d10177c000 00715190 
> > > 002bf234 cfecfb58
> > > [   59.876068] Krnl Code: 002bf39c: d507d010a000  clc 
> > > 16(8,%%r13),0(%%r10)
> > >   002bf3a2: a7840004  brc 
> > > 8,2bf3aa
> > >  #002bf3a6: a7f40001  brc 
> > > 15,2bf3a8
> > >  >002bf3aa: 91407440  tm  
> > > 1088(%%r7),64
> > >   002bf3ae: a7840208  brc 
> > > 8,2bf7be
> > >   002bf3b2: a7f401e9  brc 
> > > 15,2bf784
> > >   002bf3b6: 9104a006  tm  
> > > 6(%%r10),4
> > >   002bf3ba: a7740004  brc 
> > > 7,2bf3c2
> > > [   59.876089] Call Trace:
> > > [   59.876092] ([<002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > [   59.876095]  [<002c4310>] __split_huge_pmd+0x118/0x218
> > > [   59.876099]  [<002810e8>] unmap_single_vma+0x2d8/0xb40
> > > [   59.876102]  [<00282d66>] zap_page_range+0x116/0x318
> > > [   59.876105]  [<0029b834>] SyS_madvise+0x23c/0x5e8
> > > [   59.876108]  [<006f9f56>] system_call+0xd6/0x258
> > > [   59.876111]  [<03ff9bbfd282>] 0x3ff9bbfd282
> > > [   59.876113] INFO: lockdep is turned off.
> > > [   59.876115] Last Breaking-Event-Address:
> > > [   59.876118]  [<002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > 
> > The BUG at mm/huge_memory.c:2884 is interesting, it's the 
> > BUG_ON(!pte_none(*pte))
> > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > which is good, but doesn't that mean that if such a collapsed hugepage will
> > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > 
> > This behavior is not new, it was the same before the THP rework, so I do not
> > assume that it is related to the current problems, maybe with the exception
> > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > and the other crashes probably cannot be explained with this. Maybe I am
> > also missing something, but I do not see how collapse_huge_page() and the
> > (non-empty) pgtable deposit there can work out with the 
> > BUG_ON(!pte_none(*pte))
> > checks. Any thoughts?
> 
> I don't think there's a problem: ptes in the pgtable are cleared with
> pte_clear() in __collapse_huge_page_copy().
> 

Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
list inside the pre-allocated pgtables, instead of the struct pages, it may
also explain why we see don't the problems on x86.

We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
issues with the 

Re: [RFC 4/4] powerpc/mm: Rename global tracker for virtual to physical mapping

2016-02-18 Thread Michael Ellerman
On Wed, 2016-02-17 at 17:42 +0530, Anshuman Khandual wrote:

> This renames the global list which tracks all the virtual to physical
> mapping and also the global list which tracks all the available unused
> vmemmap_hw_map node structures.

But why? Why are the new names *so* much better that we would want to go
through all this churn?

> It also attempts to explain the purpose
> of these global linked lists and points out a possible race condition.

I'm happy to take the comments.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 1/4] powerpc/mm: Rename variable to reflect start address of a section

2016-02-18 Thread Michael Ellerman
On Wed, 2016-02-17 at 17:42 +0530, Anshuman Khandual wrote:

> The commit (16a05bff1: powerpc: start loop at section start of
> start in vmemmap_populated()) reused 'start' variable to compute
> the starting address of the memory section where the given address
> belongs. Then the same variable is used for iterating over starting
> address of all memory sections before reaching the 'end' address.
> Renaming it as 'section_start' makes the logic more clear.
> 
> Fixes: 16a05bff1 ("powerpc: start loop at section start of start in 
> vmemmap_populated()")

It's not a fix, just a cleanup. Fixes lines should be reserved for actual bug
fixes.

> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/mm/init_64.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 379a6a9..d6b9b4d 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -170,11 +170,15 @@ static unsigned long __meminit 
> vmemmap_section_start(unsigned long page)
>   */
>  static int __meminit vmemmap_populated(unsigned long start, int page_size)
>  {
> - unsigned long end = start + page_size;
> - start = (unsigned long)(pfn_to_page(vmemmap_section_start(start)));
> + unsigned long end, section_start;
>  
> - for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page)))
> - if (pfn_valid(page_to_pfn((struct page *)start)))
> + end = start + page_size;
> + section_start = (unsigned long)(pfn_to_page
> + (vmemmap_section_start(start)));
> +
> + for (; section_start < end; section_start
> + += (PAGES_PER_SECTION * sizeof(struct page)))
> + if (pfn_valid(page_to_pfn((struct page *)section_start)))
>   return 1;
>  
>   return 0;

That's not a big improvement.

But I think this code could be improved. There's a lot of casts, it seems to be
confused about whether it's iterating over addresses or struct pages.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 2/4] powerpc/mm: Add comments to the vmemmap layout

2016-02-18 Thread Michael Ellerman
On Wed, 2016-02-17 at 17:42 +0530, Anshuman Khandual wrote:

> Add some explaination to the layout of vmemmap virtual address
> space and how physical page mapping is only used for valid PFNs
> present at any point on the system.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 41 
> 
>  1 file changed, 41 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 8d1c41d..9db4a86 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -26,6 +26,47 @@
>  #define IOREMAP_BASE (PHB_IO_END)
>  #define IOREMAP_END  (KERN_VIRT_START + KERN_VIRT_SIZE)
>  
> +/*
> + * Starting address of the virtual address space where all page structs

This is so far from the variable it's referring to that it's not clear what it
refers to. So you should say "vmemmap is the starting ..."

> + * for the system physical memory are stored under the vmemmap sparse
  ^
  , when using the SPARSEMEM_VMEMMAP
> + * memory model. All possible struct pages are logically stored in a
> + * sequence in this virtual address space irrespective of the fact
> + * whether any given PFN is valid or even the memory section is valid
> + * or not.

I know what you mean but I think that could be worded better. But it's too late
for me to reword it :)

The key point is that we allocate space for a page struct for each PFN that
could be present in the system, including holes in the address space (hence
sparse). That has the nice property of meaning there is a constant relationship
between the address of a struct page and it's PFN.

> + * During boot and memory hotplug add operation when new memory
  ^   ^
  or  ,
> + * sections are added, real physical allocation and hash table bolting
  ^
  of struct pages

> + * will be performed. This saves precious physical memory when the system
> + * really does not have valid PFNs in some address ranges.


> + *
> + *  vmemmap +--+
> + * +|  page struct +--+  PFN is valid
> + * |+--+  |
> + * ||  page struct |  |  PFN is invalid
> + * |+--+  |
> + * ||  page struct +--+   |
> + * |+--+  |   |
> + * ||  page struct |  |   |
> + * |+--+  |   |
> + * ||  page struct |  |   |
> + * |+--+  |   |
> + * ||  page struct +--+   |   |
> + * |+--+  |   |   |
> + * ||  page struct |  |   |   |   +-+
> + * |+--+  |   |   +-> | PFN |
> + * ||  page struct |  |   |   +-+
> + * |+--+  |   +-> | PFN |
> + * ||  page struct |  |   +-+
> + * |+--+  +-> | PFN |
> + * ||  page struct |  +-+
> + * |+--+   +> | PFN |
> + * ||  page struct |   |  +-+
> + * |+--+   |Bolted in hash table
> + * ||  page struct +---+
> + * v+--+


The things on the right are not PFNs, they're struct pages. Each one
corresponds to a PFN, but that relationship is derived from the vmemap layout,
not the physical layout.

I think it's more like:

f000  c000 (and also 0x0)
vmemmap +--+  +--+
   +|  page struct | +--> |  page struct |
   |+--+  +--+
   ||  page struct | +--> |  page struct |
   |+--+ |+--+
   ||  page struct | +   +--> |  page struct |
   |+--+ |+--+
   ||  page struct | |   +--> |  page struct |
   |+--+ |   |+--+
   ||  page struct | |   |
   |+--+ |   |
   ||  page struct | |   |
   |+--+ |   |
   ||  page struct | |   |
   |+--+ |   |
   ||  page struct | |   |
   |+--+ |   |
   ||  page struct | +---+   |
   |+--+ |
   ||  page struct | +---+
   |+--+
   ||  page struct | No mapping
   |+--+
   ||  page 

Re: [RFC 3/4] powerpc/mm: Rename the vmemmap_backing struct and its elements

2016-02-18 Thread Michael Ellerman
On Wed, 2016-02-17 at 20:22 +0530, Aneesh Kumar K.V wrote:

> Anshuman Khandual  writes:
>

> > The structure to track single virtual to physical mapping has
> > been renamed from vmemmap_backing to vmemmap_hw_map which sounds
> > more appropriate. This forms a single entry of the global linked
> > list tracking all of the vmemmap physical mapping. The changes
> > are as follows.
> >
> > vmemmap_backing.list -> vmemmap_hw_map.link
> > vmemmap_backing.phys -> vmemmap_hw_map.paddr
> > vmemmap_backing.virt_addr -> vmemmap_hw_map.vaddr
>
> I am not sure this helps. If we are going to take these renames, can you
> wait till th book3s p9 preparation patches [1] hit upstream ?

I don't see why the new names are any better, and it's a lot of churn for
minimal if any benefit. So I'm not going to take this one.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc: Add POWER9 cputable entry

2016-02-18 Thread Michael Ellerman
On Thu, 2016-02-18 at 14:32 +1100, Michael Neuling wrote:
> On Wed, 2016-02-17 at 22:09 +1100, Michael Ellerman wrote:
> > On Wed, 2016-02-17 at 16:07 +1100, Michael Neuling wrote:
> >
> > > Add a cputable entry for POWER9.  More code is required to actually
> > > boot and run on a POWER9 but this gets the base piece in which we
> > > can
> > > start building on.
> > >
> > > Copies over from POWER8 except for:
> > > - Adds a new CPU_FTR_ARCH_30 bit to start hanging new architecture
> >
> > ARCH thirty?
> >
> > Would CPU_FTR_ARCH_3 read better?
> >
> > Or CPU_FTR_ARCH_3_00 ?
>
> The actual architecture book used to say 2.07 but now says just 3.0.
> Hence why I picked 30 vs 207.

Yeah I get the logic.

> That being said, I don't really care what we call it.

I like CPU_FTR_ARCH_3.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev