Re: [PATCH V4 03/18] powerpc/mm: add _PAGE_HASHPTE similar to 4K hash

2016-02-22 Thread Paul Mackerras
On Tue, Feb 23, 2016 at 10:18:05AM +0530, Aneesh Kumar K.V wrote:
> The difference between 64K and 4K hash fault handling is confusing
> with respect to when we set _PAGE_HASHPTE in the linux pte.
> I was trying to find out whether we miss a hpte flush in any
> scenario because of this. ie, a pte update on a linux pte, for which we
> are doing a parallel hash pte insert. After looking at it closer my
> understanding is this won't happen because pte update also look at
> _PAGE_BUSY and we will wait for hash pte insert to finish before going
> ahead with the pte update. But to avoid further confusion keep the
> hash fault handler for all the page size similar to  __hash_page_4k.
> 
> This partially reverts commit 41743a4e34f0 ("powerpc: Free a PTE bit on ppc64 
> with 64K pages"

In each of the functions you are modifying below, there is already an
explicit setting of _PAGE_HASHPTE in new_pte.  So I don't think this
is necessary, or if we do this, we can eliminate the separate setting
of _PAGE_HASHPTE later on.

In general I think it's better to leave the setting of _PAGE_HASHPTE
until we know what slot the HPTE is going to go into.  That way we
have less chance of ending up with _PAGE_HASHPTE set but bogus
information in _PAGE_F_GIX and _PAGE_F_SECOND.

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/hash64_64k.c | 4 ++--
>  arch/powerpc/mm/hugepage-hash64.c| 2 +-
>  arch/powerpc/mm/hugetlbpage-hash64.c | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
> index b2d659cf51c6..507c1e55a424 100644
> --- a/arch/powerpc/mm/hash64_64k.c
> +++ b/arch/powerpc/mm/hash64_64k.c
> @@ -76,7 +76,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
> unsigned long vsid,
>* a write access. Since this is 4K insert of 64K page size
>* also add _PAGE_COMBO
>*/
> - new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_COMBO;
> + new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_COMBO | 
> _PAGE_HASHPTE;
>   if (access & _PAGE_RW)
>   new_pte |= _PAGE_DIRTY;
>   } while (old_pte != __cmpxchg_u64((unsigned long *)ptep,

Later on in the same function:

/*
 * Insert slot number & secondary bit in PTE second half,
 * clear _PAGE_BUSY and set appropriate HPTE slot bit
 * Since we have _PAGE_BUSY set on ptep, we can be sure
 * nobody is undating hidx.
 */
hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
rpte.hidx &= ~(0xfUL << (subpg_index << 2));
*hidxp = rpte.hidx  | (slot << (subpg_index << 2));
new_pte = mark_subptegroup_valid(new_pte, subpg_index);
new_pte |=  _PAGE_HASHPTE;

> @@ -251,7 +251,7 @@ int __hash_page_64K(unsigned long ea, unsigned long 
> access,
>* Try to lock the PTE, add ACCESSED and DIRTY if it was
>* a write access.
>*/
> - new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
> + new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
>   if (access & _PAGE_RW)
>   new_pte |= _PAGE_DIRTY;
>   } while (old_pte != __cmpxchg_u64((unsigned long *)ptep,

later on:

new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE;
new_pte |= (slot << _PAGE_F_GIX_SHIFT) & (_PAGE_F_SECOND | 
_PAGE_F_GIX);

> diff --git a/arch/powerpc/mm/hugepage-hash64.c 
> b/arch/powerpc/mm/hugepage-hash64.c
> index eb2accdd76fd..56d677b7972c 100644
> --- a/arch/powerpc/mm/hugepage-hash64.c
> +++ b/arch/powerpc/mm/hugepage-hash64.c
> @@ -46,7 +46,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, 
> unsigned long vsid,
>* Try to lock the PTE, add ACCESSED and DIRTY if it was
>* a write access
>*/
> - new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED;
> + new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
>   if (access & _PAGE_RW)
>   new_pmd |= _PAGE_DIRTY;
>   } while (old_pmd != __cmpxchg_u64((unsigned long *)pmdp,

later:

hash = hpt_hash(vpn, shift, ssize);
/* insert new entry */
pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT;
new_pmd |= _PAGE_HASHPTE;

> diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
> b/arch/powerpc/mm/hugetlbpage-hash64.c
> index 8555fce902fe..08efcad7cae0 100644
> --- a/arch/powerpc/mm/hugetlbpage-hash64.c
> +++ b/arch/powerpc/mm/hugetlbpage-hash64.c
> @@ -54,7 +54,7 @@ int __hash_page_huge(unsigned long ea, unsigned long 
> access, unsigned long vsid,
>   return 1;
>   /* Try to lock the PTE, add ACCESSED and DIRTY if it was
>* a write access */
> - new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
> +   

Re: [PATCH] powerpc/pagetable: Add option to dump kernel pagetable

2016-02-22 Thread Rashmica



On 23/02/16 16:30, Anshuman Khandual wrote:

On 02/23/2016 03:57 AM, Rashmica wrote:

Hi Anshuman,

Thanks for the feedback!

On 22/02/16 21:13, Anshuman Khandual wrote:

On 02/22/2016 11:32 AM, Rashmica Gupta wrote:

Useful to be able to dump the kernel page tables to check permissions
and
memory types - derived from arm64's implementation.

Add a debugfs file to check the page tables. To use this the PPC_PTDUMP
config option must be selected.

Tested on 64BE and 64LE with both 4K and 64K page sizes.
---

This statement above must be after the  line else it will be part of
the commit message or you wanted the test note as part of commit message
itself ?

The patch seems to contain some white space problems. Please clean
them up.

Will do!

   arch/powerpc/Kconfig.debug |  12 ++
   arch/powerpc/mm/Makefile   |   1 +
   arch/powerpc/mm/dump.c | 364
+
   3 files changed, 377 insertions(+)
   create mode 100644 arch/powerpc/mm/dump.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 638f9ce740f5..e4883880abe3 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -344,4 +344,16 @@ config FAIL_IOMMU
   If you are unsure, say N.
   +config PPC_PTDUMP
+bool "Export kernel pagetable layout to userspace via debugfs"
+depends on DEBUG_KERNEL
+select DEBUG_FS
+help
+  This options dumps the state of the kernel pagetables in a
debugfs
+  file. This is only useful for kernel developers who are
working in
+  architecture specific areas of the kernel - probably not a
good idea to
+  enable this feature in a production kernel.

Just clean the paragraph alignment here
..


+
+  If you are unsure, say N.
+
   endmenu
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 1ffeda85c086..16f84bdd7597 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -40,3 +40,4 @@ obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
   obj-$(CONFIG_HIGHMEM)+= highmem.o
   obj-$(CONFIG_PPC_COPRO_BASE)+= copro_fault.o
   obj-$(CONFIG_SPAPR_TCE_IOMMU)+= mmu_context_iommu.o
+obj-$(CONFIG_PPC_PTDUMP)+= dump.o

File name like "[kernel_]pgtable_dump.c" will sound better ? Or
just use like the X86 one "dump_pagetables.c". "dump.c" sounds
very generic.

Yup, good point.

diff --git a/arch/powerpc/mm/dump.c b/arch/powerpc/mm/dump.c
new file mode 100644
index ..937b10fc40cc
--- /dev/null
+++ b/arch/powerpc/mm/dump.c
@@ -0,0 +1,364 @@
+/*
+ * Copyright 2016, Rashmica Gupta, IBM Corp.
+ *
+ * Debug helper to dump the current kernel pagetables of the system
+ * so that we can see what the various memory ranges are set to.
+ *
+ * Derived from the arm64 implementation:
+ * Copyright (c) 2014, The Linux Foundation, Laura Abbott.
+ * (C) Copyright 2008 Intel Corporation, Arjan van de Ven.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PUD_TYPE_MASK   (_AT(u64, 3) << 0)
+#define PUD_TYPE_SECT   (_AT(u64, 1) << 0)
+#define PMD_TYPE_MASK   (_AT(u64, 3) << 0)
+#define PMD_TYPE_SECT   (_AT(u64, 1) << 0)
+
+
+#if CONFIG_PGTABLE_LEVELS == 2
+#include 
+#elif CONFIG_PGTABLE_LEVELS == 3
+#include 
+#endif

Really ? Do we have any platform with only 2 level of page table ?
   

I'm not sure - was trying to cover all the bases. If you're
confident that we don't, I can remove it.

I am not sure though, may be Michael or Mikey can help here.


+
+#define pmd_sect(pmd)  ((pmd_val(pmd) & PMD_TYPE_MASK) ==
PMD_TYPE_SECT)
+#ifdef CONFIG_PPC_64K_PAGES
+#define pud_sect(pud)   (0)
+#else
+#define pud_sect(pud)   ((pud_val(pud) & PUD_TYPE_MASK) == \
+   PUD_TYPE_SECT)
+#endif

Can you please explain the use of pmd_sect() and pud_sect() defines ?


+
+
+struct addr_marker {
+unsigned long start_address;
+const char *name;
+};

All the architectures are using the same structure addr_marker.
Cannot we just move it to a generic header file ? There are
other such common structures like these in the file which are
used across architectures and can be moved to some where common ?

Could do that. Where do you think would be the appropriate place
for such a header file?

We can start at include/linux/mmdebug.h header file.


+
+enum address_markers_idx {
+VMALLOC_START_NR = 0,
+VMALLOC_END_NR,
+ISA_IO_START_NR,
+ISA_IO_END_NR,
+PHB_IO_START_NR,
+PHB_IO_END_NR,
+IOREMAP_START_NR,
+IOREMP_END_NR,
+};

Where these are used ? ^ I dont see any where.

Whoops, yes those are not

Re: [PATCH] powerpc/pagetable: Add option to dump kernel pagetable

2016-02-22 Thread Anshuman Khandual
On 02/23/2016 03:57 AM, Rashmica wrote:
> Hi Anshuman,
> 
> Thanks for the feedback!
> 
> On 22/02/16 21:13, Anshuman Khandual wrote:
>> On 02/22/2016 11:32 AM, Rashmica Gupta wrote:
>>> Useful to be able to dump the kernel page tables to check permissions
>>> and
>>> memory types - derived from arm64's implementation.
>>>
>>> Add a debugfs file to check the page tables. To use this the PPC_PTDUMP
>>> config option must be selected.
>>>
>>> Tested on 64BE and 64LE with both 4K and 64K page sizes.
>>> ---
>> This statement above must be after the  line else it will be part of
>> the commit message or you wanted the test note as part of commit message
>> itself ?
>>
>> The patch seems to contain some white space problems. Please clean
>> them up.
> Will do!
>>>   arch/powerpc/Kconfig.debug |  12 ++
>>>   arch/powerpc/mm/Makefile   |   1 +
>>>   arch/powerpc/mm/dump.c | 364
>>> +
>>>   3 files changed, 377 insertions(+)
>>>   create mode 100644 arch/powerpc/mm/dump.c
>>>
>>> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
>>> index 638f9ce740f5..e4883880abe3 100644
>>> --- a/arch/powerpc/Kconfig.debug
>>> +++ b/arch/powerpc/Kconfig.debug
>>> @@ -344,4 +344,16 @@ config FAIL_IOMMU
>>>   If you are unsure, say N.
>>>   +config PPC_PTDUMP
>>> +bool "Export kernel pagetable layout to userspace via debugfs"
>>> +depends on DEBUG_KERNEL
>>> +select DEBUG_FS
>>> +help
>>> +  This options dumps the state of the kernel pagetables in a
>>> debugfs
>>> +  file. This is only useful for kernel developers who are
>>> working in
>>> +  architecture specific areas of the kernel - probably not a
>>> good idea to
>>> +  enable this feature in a production kernel.
>> Just clean the paragraph alignment here
>> ..
>>
>>> +
>>> +  If you are unsure, say N.
>>> +
>>>   endmenu
>>> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
>>> index 1ffeda85c086..16f84bdd7597 100644
>>> --- a/arch/powerpc/mm/Makefile
>>> +++ b/arch/powerpc/mm/Makefile
>>> @@ -40,3 +40,4 @@ obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
>>>   obj-$(CONFIG_HIGHMEM)+= highmem.o
>>>   obj-$(CONFIG_PPC_COPRO_BASE)+= copro_fault.o
>>>   obj-$(CONFIG_SPAPR_TCE_IOMMU)+= mmu_context_iommu.o
>>> +obj-$(CONFIG_PPC_PTDUMP)+= dump.o
>> File name like "[kernel_]pgtable_dump.c" will sound better ? Or
>> just use like the X86 one "dump_pagetables.c". "dump.c" sounds
>> very generic.
> Yup, good point.
>>> diff --git a/arch/powerpc/mm/dump.c b/arch/powerpc/mm/dump.c
>>> new file mode 100644
>>> index ..937b10fc40cc
>>> --- /dev/null
>>> +++ b/arch/powerpc/mm/dump.c
>>> @@ -0,0 +1,364 @@
>>> +/*
>>> + * Copyright 2016, Rashmica Gupta, IBM Corp.
>>> + *
>>> + * Debug helper to dump the current kernel pagetables of the system
>>> + * so that we can see what the various memory ranges are set to.
>>> + *
>>> + * Derived from the arm64 implementation:
>>> + * Copyright (c) 2014, The Linux Foundation, Laura Abbott.
>>> + * (C) Copyright 2008 Intel Corporation, Arjan van de Ven.
>>> + *
>>> + * This program is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU General Public License
>>> + * as published by the Free Software Foundation; version 2
>>> + * of the License.
>>> + */
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#define PUD_TYPE_MASK   (_AT(u64, 3) << 0)
>>> +#define PUD_TYPE_SECT   (_AT(u64, 1) << 0)
>>> +#define PMD_TYPE_MASK   (_AT(u64, 3) << 0)
>>> +#define PMD_TYPE_SECT   (_AT(u64, 1) << 0)
>>> +
>>> +
>>> +#if CONFIG_PGTABLE_LEVELS == 2
>>> +#include 
>>> +#elif CONFIG_PGTABLE_LEVELS == 3
>>> +#include 
>>> +#endif
>> Really ? Do we have any platform with only 2 level of page table ?
>>   
> I'm not sure - was trying to cover all the bases. If you're
> confident that we don't, I can remove it.

I am not sure though, may be Michael or Mikey can help here.

>>> +
>>> +#define pmd_sect(pmd)  ((pmd_val(pmd) & PMD_TYPE_MASK) ==
>>> PMD_TYPE_SECT)
>>> +#ifdef CONFIG_PPC_64K_PAGES
>>> +#define pud_sect(pud)   (0)
>>> +#else
>>> +#define pud_sect(pud)   ((pud_val(pud) & PUD_TYPE_MASK) == \
>>> +   PUD_TYPE_SECT)
>>> +#endif
>> Can you please explain the use of pmd_sect() and pud_sect() defines ?
>>
>>> +   
>>> +
>>> +struct addr_marker {
>>> +unsigned long start_address;
>>> +const char *name;
>>> +};
>> All the architectures are using the same structure addr_marker.
>> Cannot we just move it to a generic header file ? There are
>> other such common structures like these in the file which are
>> used across architectures and can be moved to some where common ?
> Could do that.

Re: [PATCH 1/1] powerpc: Detect broken or mismatched toolchains

2016-02-22 Thread Sam Bobroff
On Mon, Feb 22, 2016 at 08:05:01PM -0600, Scott Wood wrote:
> On Mon, 2016-02-22 at 16:13 +1100, Sam Bobroff wrote:
> > It can currently be difficult to diagnose a build that fails due to
> > the compiler, linker or other parts of the toolchain being unable to
> > build binaries of the type required by the kernel config. For example
> > using a little endian toolchain to build a big endian kernel may
> > produce:
> > 
> > as: unrecognized option '-maltivec'
> > 
> > This patch adds a basic compile test and error message to
> > arch/powerpc/Makefile so that the above error becomes:
> > 
> > *** Sorry, your toolchain seems to be broken or incorrect. ***
> > Make sure it supports your kernel configuration (ppc64).
> > 
> > Signed-off-by: Sam Bobroff 
> > ---
> 
> How is this more useful than getting to actually see the way in which the
> toolchain (or the CFLAGS) is broken?

My reasoning was that it would be better because it happens at the start of the
build, rather than (possibly) a long way into it, and it indicates that the
problem is the toolchain setup (or config) itself rather than the file it's
trying to compile or link.

But I agree completely with what you're saying. I'll try re-working it in a way
that shows the command that fails and it's output.

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 18/18] powerpc/mm: Move hash64 specific definitions to separate header

2016-02-22 Thread Aneesh Kumar K.V
We will be adding a radix variant of these routines in the followup
patches. Move the hash64 variant into its own header so that we can
rename them easily later. Also split pgalloc 64k and 4k headers

Reviewed-by: Paul Mackerras 
Signed-off-by: Aneesh Kumar K.V 
---
 .../include/asm/book3s/64/pgalloc-hash-4k.h|  92 ++
 .../include/asm/book3s/64/pgalloc-hash-64k.h   |  51 ++
 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h  |  59 ++
 arch/powerpc/include/asm/book3s/64/pgalloc.h   | 197 +
 4 files changed, 209 insertions(+), 190 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
new file mode 100644
index ..54e655cbef7d
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-4k.h
@@ -0,0 +1,92 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H
+#define _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
+   pgtable_t pte_page)
+{
+   pmd_set(pmd, __pgtable_ptr_val(page_address(pte_page)));
+}
+
+static inline pgtable_t pmd_pgtable(pmd_t pmd)
+{
+   return pmd_page(pmd);
+}
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+ unsigned long address)
+{
+   return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
+ unsigned long address)
+{
+   struct page *page;
+   pte_t *pte;
+
+   pte = pte_alloc_one_kernel(mm, address);
+   if (!pte)
+   return NULL;
+   page = virt_to_page(pte);
+   if (!pgtable_page_ctor(page)) {
+   __free_page(page);
+   return NULL;
+   }
+   return page;
+}
+
+static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+   free_page((unsigned long)pte);
+}
+
+static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
+{
+   pgtable_page_dtor(ptepage);
+   __free_page(ptepage);
+}
+
+static inline void pgtable_free(void *table, unsigned index_size)
+{
+   if (!index_size)
+   free_page((unsigned long)table);
+   else {
+   BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
+   kmem_cache_free(PGT_CACHE(index_size), table);
+   }
+}
+
+#ifdef CONFIG_SMP
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   unsigned long pgf = (unsigned long)table;
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   pgf |= shift;
+   tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+   pgtable_free(table, shift);
+}
+#else /* !CONFIG_SMP */
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   pgtable_free(table, shift);
+}
+#endif /* CONFIG_SMP */
+
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
+ unsigned long address)
+{
+   tlb_flush_pgtable(tlb, address);
+   pgtable_page_dtor(table);
+   pgtable_free_tlb(tlb, page_address(table), 0);
+}
+
+#endif /* _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_4K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
new file mode 100644
index ..bd6caac272c6
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc-hash-64k.h
@@ -0,0 +1,51 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_64K_H
+#define _ASM_POWERPC_BOOK3S_64_PGALLOC_HASH_64K_H
+
+extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
+extern void page_table_free(struct mm_struct *, unsigned long *, int);
+extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
+#ifdef CONFIG_SMP
+extern void __tlb_remove_table(void *_table);
+#endif
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
+   pgtable_t pte_page)
+{
+   pmd_set(pmd, __pgtable_ptr_val(pte_page));
+}
+
+static inline pgtable_t pmd_pgtable(pmd_t pmd)
+{
+   return (pgtable_t)pmd_page_vaddr(pmd);
+}
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+ unsigned long address)
+{
+   return (pte_t *)page_table_alloc(mm, address, 1);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm

[PATCH V4 16/18] powerpc/mm: THP is only available on hash64 as of now

2016-02-22 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable-hash64.c | 373 +++
 arch/powerpc/mm/pgtable_64.c | 373 ---
 2 files changed, 373 insertions(+), 373 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index bcc54aae2e7c..0139b623fdae 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -21,6 +21,9 @@
 
 #include "mmu_decl.h"
 
+#define CREATE_TRACE_POINTS
+#include 
+
 #if PGTABLE_RANGE > USER_VSID_RANGE
 #warning Limited user VSID range means pagetable space is wasted
 #endif
@@ -245,3 +248,373 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
/* Perform the setting of the PTE */
__set_pte_at(mm, addr, ptep, pte, 0);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+
+/*
+ * This is called when relaxing access to a hugepage. It's also called in the 
page
+ * fault path when we don't hit any of the major fault cases, ie, a minor
+ * update of _PAGE_ACCESSED, _PAGE_DIRTY, etc... The generic code will have
+ * handled those two for us, we additionally deal with missing execute
+ * permission here on some processors
+ */
+int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp, pmd_t entry, int dirty)
+{
+   int changed;
+#ifdef CONFIG_DEBUG_VM
+   WARN_ON(!pmd_trans_huge(*pmdp));
+   assert_spin_locked(&vma->vm_mm->page_table_lock);
+#endif
+   changed = !pmd_same(*(pmdp), entry);
+   if (changed) {
+   __ptep_set_access_flags(pmdp_ptep(pmdp), pmd_pte(entry));
+   /*
+* Since we are not supporting SW TLB systems, we don't
+* have any thing similar to flush_tlb_page_nohash()
+*/
+   }
+   return changed;
+}
+
+unsigned long pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, unsigned long clr,
+ unsigned long set)
+{
+
+   unsigned long old, tmp;
+
+#ifdef CONFIG_DEBUG_VM
+   WARN_ON(!pmd_trans_huge(*pmdp));
+   assert_spin_locked(&mm->page_table_lock);
+#endif
+
+#ifdef PTE_ATOMIC_UPDATES
+   __asm__ __volatile__(
+   "1: ldarx   %0,0,%3\n\
+   andi.   %1,%0,%6\n\
+   bne-1b \n\
+   andc%1,%0,%4 \n\
+   or  %1,%1,%7\n\
+   stdcx.  %1,0,%3 \n\
+   bne-1b"
+   : "=&r" (old), "=&r" (tmp), "=m" (*pmdp)
+   : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (_PAGE_BUSY), "r" (set)
+   : "cc" );
+#else
+   old = pmd_val(*pmdp);
+   *pmdp = __pmd((old & ~clr) | set);
+#endif
+   trace_hugepage_update(addr, old, clr, set);
+   if (old & _PAGE_HASHPTE)
+   hpte_do_hugepage_flush(mm, addr, pmdp, old);
+   return old;
+}
+
+pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+   pmd_t pmd;
+
+   VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+   VM_BUG_ON(pmd_trans_huge(*pmdp));
+
+   pmd = *pmdp;
+   pmd_clear(pmdp);
+   /*
+* Wait for all pending hash_page to finish. This is needed
+* in case of subpage collapse. When we collapse normal pages
+* to hugepage, we first clear the pmd, then invalidate all
+* the PTE entries. The assumption here is that any low level
+* page fault will see a none pmd and take the slow path that
+* will wait on mmap_sem. But we could very well be in a
+* hash_page with local ptep pointer value. Such a hash page
+* can result in adding new HPTE entries for normal subpages.
+* That means we could be modifying the page content as we
+* copy them to a huge page. So wait for parallel hash_page
+* to finish before invalidating HPTE entries. We can do this
+* by sending an IPI to all the cpus and executing a dummy
+* function there.
+*/
+   kick_all_cpus_sync();
+   /*
+* Now invalidate the hpte entries in the range
+* covered by pmd. This make sure we take a
+* fault and will find the pmd as none, which will
+* result in a major fault which takes mmap_sem and
+* hence wait for collapse to complete. Without this
+* the __collapse_huge_page_copy can result in copying
+* the old content.
+*/
+   flush_tlb_pmd_range(vma->vm_mm, &pmd, address);
+   return pmd;
+}
+
+int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+   return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
+}
+
+/*
+ * We currently remove entries from the hashtable regardless of whether
+ * the entry was young or dirty. The generic routines only flush if the
+ * entry was young or dirty which i

[PATCH V4 17/18] powerpc/mm: Use generic version of pmdp_clear_flush_young

2016-02-22 Thread Aneesh Kumar K.V
The radix variant is going to require a flush_tlb_range. We can't then
have this as static inline because of the usage of HPAGE_PMD_SIZE. So
we are forced to make it a function in which case we can use the generic 
version.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  3 ---
 arch/powerpc/mm/pgtable-hash64.c | 10 ++
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index bf132bbbe9d9..06b104f6a989 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -316,9 +316,6 @@ extern int pmdp_set_access_flags(struct vm_area_struct *vma,
 #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG
 extern int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 unsigned long address, pmd_t *pmdp);
-#define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH
-extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp);
 
 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
 extern pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index 0139b623fdae..cbd81345fdec 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -350,12 +350,6 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, 
unsigned long address,
return pmd;
 }
 
-int pmdp_test_and_clear_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp)
-{
-   return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
-}
-
 /*
  * We currently remove entries from the hashtable regardless of whether
  * the entry was young or dirty. The generic routines only flush if the
@@ -364,8 +358,8 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma,
  * We should be more intelligent about this but for the moment we override
  * these functions and force a tlb flush unconditionally
  */
-int pmdp_clear_flush_young(struct vm_area_struct *vma,
- unsigned long address, pmd_t *pmdp)
+int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
 {
return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
 }
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 15/18] powerpc/mm: Move hash page table related functions to pgtable-hash64.c

2016-02-22 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h|   2 +
 arch/powerpc/include/asm/nohash/64/pgtable.h |   3 +
 arch/powerpc/mm/Makefile |   3 +-
 arch/powerpc/mm/init_64.c| 114 +
 arch/powerpc/mm/mem.c|  29 +---
 arch/powerpc/mm/mmu_decl.h   |   5 -
 arch/powerpc/mm/pgtable-book3e.c | 163 ++
 arch/powerpc/mm/pgtable-hash64.c | 247 +++
 arch/powerpc/mm/pgtable.c|   9 +
 arch/powerpc/mm/pgtable_64.c |  88 --
 arch/powerpc/mm/ppc_mmu_32.c |  30 
 11 files changed, 464 insertions(+), 229 deletions(-)
 create mode 100644 arch/powerpc/mm/pgtable-book3e.c
 create mode 100644 arch/powerpc/mm/pgtable-hash64.c

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index f948d081f28e..9b451cb8294a 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -527,6 +527,8 @@ static inline void hpte_do_hugepage_flush(struct mm_struct 
*mm,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+extern int map_kernel_page(unsigned long ea, unsigned long pa,
+  unsigned long flags);
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 10debb93c4a4..37045fb9f41e 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -362,6 +362,9 @@ static inline void __ptep_set_access_flags(pte_t *ptep, 
pte_t entry)
 
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
 void pgtable_cache_init(void);
+extern int map_kernel_page(unsigned long ea, unsigned long pa,
+  unsigned long flags);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_NOHASH_64_PGTABLE_H */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 1ffeda85c086..6b5cc805c7ba 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -13,7 +13,8 @@ obj-$(CONFIG_PPC_MMU_NOHASH)  += mmu_context_nohash.o 
tlb_nohash.o \
   tlb_nohash_low.o
 obj-$(CONFIG_PPC_BOOK3E)   += tlb_low_$(CONFIG_WORD_SIZE)e.o
 hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o
-obj-$(CONFIG_PPC_STD_MMU_64)   += hash_utils_64.o slb_low.o slb.o $(hash64-y)
+obj-$(CONFIG_PPC_BOOK3E_64)   += pgtable-book3e.o
+obj-$(CONFIG_PPC_STD_MMU_64)   += pgtable-hash64.o hash_utils_64.o slb_low.o 
slb.o $(hash64-y)
 obj-$(CONFIG_PPC_STD_MMU_32)   += ppc_mmu_32.o hash_low_32.o
 obj-$(CONFIG_PPC_STD_MMU)  += tlb_hash$(CONFIG_WORD_SIZE).o \
   mmu_context_hash$(CONFIG_WORD_SIZE).o
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 8ce1ec24d573..05b025a0efe6 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -65,38 +65,10 @@
 
 #include "mmu_decl.h"
 
-#ifdef CONFIG_PPC_STD_MMU_64
-#if PGTABLE_RANGE > USER_VSID_RANGE
-#warning Limited user VSID range means pagetable space is wasted
-#endif
-
-#if (TASK_SIZE_USER64 < PGTABLE_RANGE) && (TASK_SIZE_USER64 < USER_VSID_RANGE)
-#warning TASK_SIZE is smaller than it needs to be.
-#endif
-#endif /* CONFIG_PPC_STD_MMU_64 */
-
 phys_addr_t memstart_addr = ~0;
 EXPORT_SYMBOL_GPL(memstart_addr);
 phys_addr_t kernstart_addr;
 EXPORT_SYMBOL_GPL(kernstart_addr);
-
-static void pgd_ctor(void *addr)
-{
-   memset(addr, 0, PGD_TABLE_SIZE);
-}
-
-static void pud_ctor(void *addr)
-{
-   memset(addr, 0, PUD_TABLE_SIZE);
-}
-
-static void pmd_ctor(void *addr)
-{
-   memset(addr, 0, PMD_TABLE_SIZE);
-}
-
-struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
-
 /*
  * Create a kmem_cache() for pagetables.  This is not used for PTE
  * pages - they're linked to struct page, come from the normal free
@@ -104,6 +76,7 @@ struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
  * everything else.  Caches created by this function are used for all
  * the higher level pagetables, and for hugepage pagetables.
  */
+struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
 {
char *name;
@@ -138,25 +111,6 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void 
*))
pr_debug("Allocated pgtable cache for order %d\n", shift);
 }
 
-
-void pgtable_cache_init(void)
-{
-   pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
-   pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor);
-   /*
-* In all current configs, when the PUD index exists it's the
-* same size as either the pgd or pmd index except with THP enabled
-* on book3s 64
-*/
-   if (PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE))
-   pgtable_cache_add(PUD_INDEX_

[PATCH V4 14/18] powerpc/mm: Create a new headers for tlbflush for hash64

2016-02-22 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 94 ++
 arch/powerpc/include/asm/tlbflush.h| 92 +
 2 files changed, 95 insertions(+), 91 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
new file mode 100644
index ..1b753f96b374
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -0,0 +1,94 @@
+#ifndef _ASM_POWERPC_BOOK3S_64_TLBFLUSH_HASH_H
+#define _ASM_POWERPC_BOOK3S_64_TLBFLUSH_HASH_H
+
+#define MMU_NO_CONTEXT 0
+
+/*
+ * TLB flushing for 64-bit hash-MMU CPUs
+ */
+
+#include 
+#include 
+
+#define PPC64_TLB_BATCH_NR 192
+
+struct ppc64_tlb_batch {
+   int active;
+   unsigned long   index;
+   struct mm_struct*mm;
+   real_pte_t  pte[PPC64_TLB_BATCH_NR];
+   unsigned long   vpn[PPC64_TLB_BATCH_NR];
+   unsigned intpsize;
+   int ssize;
+};
+DECLARE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
+
+extern void __flush_tlb_pending(struct ppc64_tlb_batch *batch);
+
+#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
+
+static inline void arch_enter_lazy_mmu_mode(void)
+{
+   struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
+
+   batch->active = 1;
+}
+
+static inline void arch_leave_lazy_mmu_mode(void)
+{
+   struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
+
+   if (batch->index)
+   __flush_tlb_pending(batch);
+   batch->active = 0;
+}
+
+#define arch_flush_lazy_mmu_mode()  do {} while (0)
+
+
+extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
+   int ssize, unsigned long flags);
+extern void flush_hash_range(unsigned long number, int local);
+extern void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
+   pmd_t *pmdp, unsigned int psize, int ssize,
+   unsigned long flags);
+
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+}
+
+static inline void flush_tlb_mm(struct mm_struct *mm)
+{
+}
+
+static inline void local_flush_tlb_page(struct vm_area_struct *vma,
+   unsigned long vmaddr)
+{
+}
+
+static inline void flush_tlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+}
+
+static inline void flush_tlb_page_nohash(struct vm_area_struct *vma,
+unsigned long vmaddr)
+{
+}
+
+static inline void flush_tlb_range(struct vm_area_struct *vma,
+  unsigned long start, unsigned long end)
+{
+}
+
+static inline void flush_tlb_kernel_range(unsigned long start,
+ unsigned long end)
+{
+}
+
+/* Private function for use by PCI IO mapping code */
+extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
+unsigned long end);
+extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd,
+   unsigned long addr);
+#endif /*  _ASM_POWERPC_BOOK3S_64_TLBFLUSH_HASH_H */
diff --git a/arch/powerpc/include/asm/tlbflush.h 
b/arch/powerpc/include/asm/tlbflush.h
index 23d351ca0303..9f77f85e3e99 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -78,97 +78,7 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm)
 }
 
 #elif defined(CONFIG_PPC_STD_MMU_64)
-
-#define MMU_NO_CONTEXT 0
-
-/*
- * TLB flushing for 64-bit hash-MMU CPUs
- */
-
-#include 
-#include 
-
-#define PPC64_TLB_BATCH_NR 192
-
-struct ppc64_tlb_batch {
-   int active;
-   unsigned long   index;
-   struct mm_struct*mm;
-   real_pte_t  pte[PPC64_TLB_BATCH_NR];
-   unsigned long   vpn[PPC64_TLB_BATCH_NR];
-   unsigned intpsize;
-   int ssize;
-};
-DECLARE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
-
-extern void __flush_tlb_pending(struct ppc64_tlb_batch *batch);
-
-#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
-
-static inline void arch_enter_lazy_mmu_mode(void)
-{
-   struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
-
-   batch->active = 1;
-}
-
-static inline void arch_leave_lazy_mmu_mode(void)
-{
-   struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
-
-   if (batch->index)
-   __flush_tlb_pending(batch);
-   batch->active = 0;
-}
-
-#define arch_flush_lazy_mmu_mode()  do {} while (0)
-
-
-extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
-   int ssize, unsigned long flags);
-extern void flush_hash_range(unsigned long 

[PATCH V4 13/18] powerpc/mm: Move hash related mmu-*.h headers to book3s/

2016-02-22 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/{mmu-hash32.h => book3s/32/mmu-hash.h} | 0
 arch/powerpc/include/asm/{mmu-hash64.h => book3s/64/mmu-hash.h} | 0
 arch/powerpc/include/asm/mmu.h  | 4 ++--
 arch/powerpc/kernel/idle_power7.S   | 2 +-
 arch/powerpc/kvm/book3s_32_mmu_host.c   | 2 +-
 arch/powerpc/kvm/book3s_64_mmu.c| 2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c   | 2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
 arch/powerpc/kvm/book3s_64_vio.c| 2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c | 2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 12 files changed, 11 insertions(+), 11 deletions(-)
 rename arch/powerpc/include/asm/{mmu-hash32.h => book3s/32/mmu-hash.h} (100%)
 rename arch/powerpc/include/asm/{mmu-hash64.h => book3s/64/mmu-hash.h} (100%)

diff --git a/arch/powerpc/include/asm/mmu-hash32.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-hash32.h
rename to arch/powerpc/include/asm/book3s/32/mmu-hash.h
diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-hash64.h
rename to arch/powerpc/include/asm/book3s/64/mmu-hash.h
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 54d46504733d..8ca1c983bf6c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -183,10 +183,10 @@ static inline void assert_pte_locked(struct mm_struct 
*mm, unsigned long addr)
 
 #if defined(CONFIG_PPC_STD_MMU_64)
 /* 64-bit classic hash table MMU */
-#  include 
+#include 
 #elif defined(CONFIG_PPC_STD_MMU_32)
 /* 32-bit classic hash table MMU */
-#  include 
+#include 
 #elif defined(CONFIG_40x)
 /* 40x-style software loaded TLB */
 #  include 
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index cf4fb5429cf1..470ceebd2d23 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -19,7 +19,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #undef DEBUG
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 55c4d51ea3e2..999106991a76 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -22,7 +22,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 9bf7031a67ff..b9131aa1aedf 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -26,7 +26,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 /* #define DEBUG_MMU */
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 913cd2198fa6..114edace6cdd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -23,7 +23,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fb37290a57b4..c7b78d8336b2 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -32,7 +32,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc94dad..9c3b76bb69d9 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -30,7 +30,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3e0039..039028d3ccb5 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -29,7 +29,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 91700518bbf3..4cb8db05f3e5 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6ee26de9a1de..c613fee0b9f7 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #define VCPU_GPRS_TM(reg) (((reg) * ULONG_SIZE) + VCPU_GPR_TM)

[PATCH V4 12/18] powerpc/mm: Use flush_tlb_page in ptep_clear_flush_young

2016-02-22 Thread Aneesh Kumar K.V
This should not have any impact for hash linux implementation. But radix
would require us to flush tlb after clearing accessed bit. Also move
code that is not dependent on pte bits to generic header.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 45 +---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 39 
 arch/powerpc/include/asm/mmu-hash64.h|  2 +-
 3 files changed, 48 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index d0ee6fcef823..f948d081f28e 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -272,6 +272,14 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
return old;
 }
 
+/*
+ * We currently remove entries from the hashtable regardless of whether
+ * the entry was young or dirty. The generic routines only flush if the
+ * entry was young or dirty which is not good enough.
+ *
+ * We should be more intelligent about this but for the moment we override
+ * these functions and force a tlb flush unconditionally
+ */
 static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pte_t *ptep)
 {
@@ -282,13 +290,6 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0);
return (old & _PAGE_ACCESSED) != 0;
 }
-#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
-#define ptep_test_and_clear_young(__vma, __addr, __ptep)  \
-({\
-   int __r;   \
-   __r = __ptep_test_and_clear_young((__vma)->vm_mm, __addr, __ptep); \
-   __r;   \
-})
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
@@ -310,36 +311,6 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
pte_update(mm, addr, ptep, _PAGE_RW, 0, 1);
 }
 
-/*
- * We currently remove entries from the hashtable regardless of whether
- * the entry was young or dirty. The generic routines only flush if the
- * entry was young or dirty which is not good enough.
- *
- * We should be more intelligent about this but for the moment we override
- * these functions and force a tlb flush unconditionally
- */
-#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
-#define ptep_clear_flush_young(__vma, __address, __ptep)   \
-({ \
-   int __young = __ptep_test_and_clear_young((__vma)->vm_mm, __address, \
- __ptep);  \
-   __young;\
-})
-
-#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
-static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
-}
-
-static inline void pte_clear(struct mm_struct *mm, unsigned long addr,
-pte_t * ptep)
-{
-   pte_update(mm, addr, ptep, ~0UL, 0, 0);
-}
-
 
 /* Set the dirty and/or accessed bits atomically in a linux PTE, this
  * function doesn't need to flush the hash entry
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 77d3ce05798e..bf132bbbe9d9 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -8,6 +8,10 @@
 #include 
 #include 
 
+#ifndef __ASSEMBLY__
+#include 
+#include 
+#endif
 /*
  * The second half of the kernel virtual space is used for IO mappings,
  * it's itself carved into the PIO region (ISA and PHB IO space) and
@@ -62,6 +66,41 @@
 
 #endif /* __real_pte */
 
+#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
+static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
+   unsigned long address,
+   pte_t *ptep)
+{
+   return  __ptep_test_and_clear_young(vma->vm_mm, address, ptep);
+}
+
+#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
+static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
+unsigned long address, pte_t *ptep)
+{
+   int young;
+
+   young = __ptep_test_and_clear_young(vma->vm_mm, address, ptep);
+   if (young)
+   flush_tlb_page(vma, address);
+   return young;
+}
+
+#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
+static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
+  unsigned long addr, pte_t *

[PATCH V4 11/18] powerpc/mm: Hugetlbfs is book3s_64 and fsl_book3e (32 or 64)

2016-02-22 Thread Aneesh Kumar K.V
We move large part of fsl related code to hugetlbpage-book3e.c.
Only code movement. This also avoid #ifdef in the code.

Eventhough we allow hugetlbfs only for book3s 64 and fsl book3e, I am
still retaining the #ifdef in hugetlbpage-book3e.c. It looks like there
was an attempt to support hugetlbfs on other non hash platforms. I
didn't want to loose that work.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/hugetlb.h   |   1 +
 arch/powerpc/mm/hugetlbpage-book3e.c | 293 +
 arch/powerpc/mm/hugetlbpage-hash64.c | 120 +++
 arch/powerpc/mm/hugetlbpage.c| 401 +--
 4 files changed, 415 insertions(+), 400 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 42814f0567cc..16078780aa7b 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -47,6 +47,7 @@ static inline unsigned int hugepd_shift(hugepd_t hpd)
 
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
+#define hugepd_none(hpd)   ((hpd).pd == 0)
 
 static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
unsigned pdshift)
diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c 
b/arch/powerpc/mm/hugetlbpage-book3e.c
index 7e6d0880813f..4c43a104e35c 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -7,6 +7,39 @@
  */
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Tracks gpages after the device tree is scanned and before the
+ * huge_boot_pages list is ready.  On non-Freescale implementations, this is
+ * just used to track 16G pages and so is a single array.  FSL-based
+ * implementations may have more than one gpage size, so we need multiple
+ * arrays
+ */
+#ifdef CONFIG_PPC_FSL_BOOK3E
+#define MAX_NUMBER_GPAGES  128
+struct psize_gpages {
+   u64 gpage_list[MAX_NUMBER_GPAGES];
+   unsigned int nr_gpages;
+};
+static struct psize_gpages gpage_freearray[MMU_PAGE_COUNT];
+#endif
+
+/*
+ * These macros define how to determine which level of the page table holds
+ * the hpdp.
+ */
+#ifdef CONFIG_PPC_FSL_BOOK3E
+#define HUGEPD_PGD_SHIFT PGDIR_SHIFT
+#define HUGEPD_PUD_SHIFT PUD_SHIFT
+#else
+#define HUGEPD_PGD_SHIFT PUD_SHIFT
+#define HUGEPD_PUD_SHIFT PMD_SHIFT
+#endif
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
 #ifdef CONFIG_PPC64
@@ -197,3 +230,263 @@ void flush_hugetlb_page(struct vm_area_struct *vma, 
unsigned long vmaddr)
 
__flush_tlb_page(vma->vm_mm, vmaddr, tsize, 0);
 }
+
+static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
+  unsigned long address, unsigned pdshift, unsigned 
pshift)
+{
+   struct kmem_cache *cachep;
+   pte_t *new;
+
+   int i;
+   int num_hugepd = 1 << (pshift - pdshift);
+   cachep = hugepte_cache;
+
+   new = kmem_cache_zalloc(cachep, GFP_KERNEL|__GFP_REPEAT);
+
+   BUG_ON(pshift > HUGEPD_SHIFT_MASK);
+   BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
+
+   if (! new)
+   return -ENOMEM;
+
+   spin_lock(&mm->page_table_lock);
+   /*
+* We have multiple higher-level entries that point to the same
+* actual pte location.  Fill in each as we go and backtrack on error.
+* We need all of these so the DTLB pgtable walk code can find the
+* right higher-level entry without knowing if it's a hugepage or not.
+*/
+   for (i = 0; i < num_hugepd; i++, hpdp++) {
+   if (unlikely(!hugepd_none(*hpdp)))
+   break;
+   else
+   /* We use the old format for PPC_FSL_BOOK3E */
+   hpdp->pd = ((unsigned long)new & ~PD_HUGE) | pshift;
+   }
+   /* If we bailed from the for loop early, an error occurred, clean up */
+   if (i < num_hugepd) {
+   for (i = i - 1 ; i >= 0; i--, hpdp--)
+   hpdp->pd = 0;
+   kmem_cache_free(cachep, new);
+   }
+   spin_unlock(&mm->page_table_lock);
+   return 0;
+}
+
+pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long 
sz)
+{
+   pgd_t *pg;
+   pud_t *pu;
+   pmd_t *pm;
+   hugepd_t *hpdp = NULL;
+   unsigned pshift = __ffs(sz);
+   unsigned pdshift = PGDIR_SHIFT;
+
+   addr &= ~(sz-1);
+
+   pg = pgd_offset(mm, addr);
+
+   if (pshift >= HUGEPD_PGD_SHIFT) {
+   hpdp = (hugepd_t *)pg;
+   } else {
+   pdshift = PUD_SHIFT;
+   pu = pud_alloc(mm, pg, addr);
+   if (pshift >= HUGEPD_PUD_SHIFT) {
+   hpdp = (hugepd_t *)pu;
+   } else {
+   pdshift = PMD_SHIFT;
+   pm = pmd_alloc(mm, pu, addr);
+   hpdp = (hugepd_t *)pm;
+   }
+   }
+
+   if (!hpdp)
+   return NULL;
+
+   BUG_ON(!hugepd_none(*hpdp) && !huge

[PATCH V4 10/18] powerpc/mm: Copy pgalloc (part 3)

2016-02-22 Thread Aneesh Kumar K.V
64bit book3s now always have 4 level page table irrespective of linux
page size. Move the related code out of #ifdef

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 55 +---
 1 file changed, 18 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 54017260c8bf..c6ba334a38c1 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -51,7 +51,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
 }
 
-#ifndef CONFIG_PPC_64K_PAGES
 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
 {
pgd_set(pgd, __pgtable_ptr_val(pud));
@@ -79,6 +78,14 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, 
pmd_t *pmd,
pmd_set(pmd, __pgtable_ptr_val(pte));
 }
 
+/*
+ * FIXME!!
+ * Between 4K and 64K pages, we differ in what is stored in pmd. ie.
+ * typedef pte_t *pgtable_t; -> 64K
+ * typedef struct page *pgtable_t; -> 4k
+ */
+#ifndef CONFIG_PPC_64K_PAGES
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
pgtable_t pte_page)
 {
@@ -176,36 +183,6 @@ extern void pgtable_free_tlb(struct mmu_gather *tlb, void 
*table, int shift);
 extern void __tlb_remove_table(void *_table);
 #endif
 
-#ifndef __PAGETABLE_PUD_FOLDED
-/* book3s 64 is 4 level page table */
-static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
-{
-   pgd_set(pgd, __pgtable_ptr_val(pud));
-}
-
-static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   return kmem_cache_alloc(PGT_CACHE(PUD_INDEX_SIZE),
-   GFP_KERNEL|__GFP_REPEAT);
-}
-
-static inline void pud_free(struct mm_struct *mm, pud_t *pud)
-{
-   kmem_cache_free(PGT_CACHE(PUD_INDEX_SIZE), pud);
-}
-#endif
-
-static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
-{
-   pud_set(pud, __pgtable_ptr_val(pmd));
-}
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
-  pte_t *pte)
-{
-   pmd_set(pmd, __pgtable_ptr_val(pte));
-}
-
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
pgtable_t pte_page)
 {
@@ -258,13 +235,17 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
 }
 
-#define __pmd_free_tlb(tlb, pmd, addr)   \
-   pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX)
-#ifndef __PAGETABLE_PUD_FOLDED
-#define __pud_free_tlb(tlb, pud, addr)   \
-   pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
+  unsigned long address)
+{
+return pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX);
+}
 
-#endif /* __PAGETABLE_PUD_FOLDED */
+static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
+  unsigned long address)
+{
+pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE);
+}
 
 #define check_pgt_cache()  do { } while (0)
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 09/18] powerpc/mm: Copy pgalloc (part 2)

2016-02-22 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h   |  6 +++---
 arch/powerpc/include/asm/book3s/64/pgalloc.h   | 17 ++--
 arch/powerpc/include/asm/book3s/pgalloc.h  | 19 ++
 .../asm/{pgalloc-32.h => nohash/32/pgalloc.h}  |  0
 .../asm/{pgalloc-64.h => nohash/64/pgalloc.h}  |  0
 arch/powerpc/include/asm/nohash/pgalloc.h  | 23 ++
 arch/powerpc/include/asm/pgalloc.h | 19 +++---
 7 files changed, 59 insertions(+), 25 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/pgalloc.h
 rename arch/powerpc/include/asm/{pgalloc-32.h => nohash/32/pgalloc.h} (100%)
 rename arch/powerpc/include/asm/{pgalloc-64.h => nohash/64/pgalloc.h} (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/pgalloc.h

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 76d6b9e0c8a9..a2350194fc76 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGALLOC_32_H
-#define _ASM_POWERPC_PGALLOC_32_H
+#ifndef _ASM_POWERPC_BOOK3S_32_PGALLOC_H
+#define _ASM_POWERPC_BOOK3S_32_PGALLOC_H
 
 #include 
 
@@ -106,4 +106,4 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
pgtable_page_dtor(table);
pgtable_free_tlb(tlb, page_address(table), 0);
 }
-#endif /* _ASM_POWERPC_PGALLOC_32_H */
+#endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 8d5fc3ac43da..54017260c8bf 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGALLOC_64_H
-#define _ASM_POWERPC_PGALLOC_64_H
+#ifndef _ASM_POWERPC_BOOK3S_64_PGALLOC_H
+#define _ASM_POWERPC_BOOK3S_64_PGALLOC_H
 /*
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -52,8 +52,10 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 }
 
 #ifndef CONFIG_PPC_64K_PAGES
-
-#define pgd_populate(MM, PGD, PUD) pgd_set(PGD, __pgtable_ptr_val(PUD))
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+{
+   pgd_set(pgd, __pgtable_ptr_val(pud));
+}
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
@@ -83,7 +85,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
pmd_set(pmd, __pgtable_ptr_val(page_address(pte_page)));
 }
 
-#define pmd_pgtable(pmd) pmd_page(pmd)
+static inline pgtable_t pmd_pgtable(pmd_t pmd)
+{
+   return pmd_page(pmd);
+}
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
  unsigned long address)
@@ -263,4 +268,4 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
 
 #define check_pgt_cache()  do { } while (0)
 
-#endif /* _ASM_POWERPC_PGALLOC_64_H */
+#endif /* _ASM_POWERPC_BOOK3S_64_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/pgalloc.h 
b/arch/powerpc/include/asm/book3s/pgalloc.h
new file mode 100644
index ..54f591e9572e
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/pgalloc.h
@@ -0,0 +1,19 @@
+#ifndef _ASM_POWERPC_BOOK3S_PGALLOC_H
+#define _ASM_POWERPC_BOOK3S_PGALLOC_H
+
+#include 
+
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+static inline void tlb_flush_pgtable(struct mmu_gather *tlb,
+unsigned long address)
+{
+
+}
+
+#ifdef CONFIG_PPC64
+#include 
+#else
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_BOOK3S_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/pgalloc-32.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
similarity index 100%
rename from arch/powerpc/include/asm/pgalloc-32.h
rename to arch/powerpc/include/asm/nohash/32/pgalloc.h
diff --git a/arch/powerpc/include/asm/pgalloc-64.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
similarity index 100%
rename from arch/powerpc/include/asm/pgalloc-64.h
rename to arch/powerpc/include/asm/nohash/64/pgalloc.h
diff --git a/arch/powerpc/include/asm/nohash/pgalloc.h 
b/arch/powerpc/include/asm/nohash/pgalloc.h
new file mode 100644
index ..b39ec956d71e
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/pgalloc.h
@@ -0,0 +1,23 @@
+#ifndef _ASM_POWERPC_NOHASH_PGALLOC_H
+#define _ASM_POWERPC_NOHASH_PGALLOC_H
+
+#include 
+
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+#ifdef CONFIG_PPC64
+extern void tlb_flush_pgtable(struct mmu_gather *tlb, unsigned long address);
+#else
+/* 44x etc which is BOOKE not BOOK3E */
+static inline void tlb_flush_pgtable(struct mmu_gather *tlb,
+unsigned long address)
+{
+
+}
+#endif /* !CONFIG_PPC_BOOK3E */
+
+#ifdef CONFIG_PPC64
+#include 
+#else
+#include 
+#

[PATCH V4 07/18] powerpc/mm: Update masked bits for linux page table

2016-02-22 Thread Aneesh Kumar K.V
We now use physical address in upper page table tree levels. Even though
they are aligned to their size, for the masked bits we use the
overloaded bit positions as per PowerISA 3.0. We keep the bad bits check
as it is, and will use conditional there when adding radix. Bad bits
check also check for reserved bits and we oveload some of the reserved
fields of radix in hash config.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index f0f5f91d7909..60c2c912c3a7 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -60,15 +60,12 @@
 #define PTE_FRAG_SIZE_SHIFT  12
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
-/*
- * Bits to mask out from a PMD to get to the PTE page
- * PMDs point to PTE table fragments which are PTE_FRAG_SIZE aligned.
- */
-#define PMD_MASKED_BITS(PTE_FRAG_SIZE - 1)
-/* Bits to mask out from a PGD/PUD to get to the PMD page */
-#define PUD_MASKED_BITS0x1ff
-/* FIXME!! Will be fixed in next patch */
-#define PGD_MASKED_BITS0
+/* Bits to mask out from a PMD to get to the PTE page */
+#define PMD_MASKED_BITS0xc0ffUL
+/* Bits to mask out from a PUD to get to the PMD page */
+#define PUD_MASKED_BITS0xc0ffUL
+/* Bits to mask out from a PGD to get to the PUD page */
+#define PGD_MASKED_BITS0xc0ffUL
 
 #ifndef __ASSEMBLY__
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 08/18] powerpc/mm: Copy pgalloc (part 1)

2016-02-22 Thread Aneesh Kumar K.V
This patch make a copy of pgalloc routines for book3s. The idea is to
enable a hash64 copy of these pgalloc routines which can be later
updated to have a radix conditional. Radix introduce a new page table
format with different page table size.

This mostly does:

cp pgalloc-32.h book3s/32/pgalloc.h
cp pgalloc-64.h book3s/64/pgalloc.h

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 109 +++
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 266 +++
 2 files changed, 375 insertions(+)
 create mode 100644 arch/powerpc/include/asm/book3s/32/pgalloc.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgalloc.h

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
new file mode 100644
index ..76d6b9e0c8a9
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -0,0 +1,109 @@
+#ifndef _ASM_POWERPC_PGALLOC_32_H
+#define _ASM_POWERPC_PGALLOC_32_H
+
+#include 
+
+/* For 32-bit, all levels of page tables are just drawn from get_free_page() */
+#define MAX_PGTABLE_INDEX_SIZE 0
+
+extern void __bad_pte(pmd_t *pmd);
+
+extern pgd_t *pgd_alloc(struct mm_struct *mm);
+extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
+
+/*
+ * We don't have any real pmd's, and this code never triggers because
+ * the pgd will always be present..
+ */
+/* #define pmd_alloc_one(mm,address)   ({ BUG(); ((pmd_t *)2); }) */
+#define pmd_free(mm, x)do { } while (0)
+#define __pmd_free_tlb(tlb,x,a)do { } while (0)
+/* #define pgd_populate(mm, pmd, pte)  BUG() */
+
+#ifndef CONFIG_BOOKE
+
+static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
+  pte_t *pte)
+{
+   *pmdp = __pmd(__pa(pte) | _PMD_PRESENT);
+}
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
+   pgtable_t pte_page)
+{
+   *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_PRESENT);
+}
+
+#define pmd_pgtable(pmd) pmd_page(pmd)
+#else
+
+static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
+  pte_t *pte)
+{
+   *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
+}
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
+   pgtable_t pte_page)
+{
+   *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | 
_PMD_PRESENT);
+}
+
+#define pmd_pgtable(pmd) pmd_page(pmd)
+#endif
+
+extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
+extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
+
+static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+   free_page((unsigned long)pte);
+}
+
+static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
+{
+   pgtable_page_dtor(ptepage);
+   __free_page(ptepage);
+}
+
+static inline void pgtable_free(void *table, unsigned index_size)
+{
+   BUG_ON(index_size); /* 32-bit doesn't use this */
+   free_page((unsigned long)table);
+}
+
+#define check_pgt_cache()  do { } while (0)
+
+#ifdef CONFIG_SMP
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   unsigned long pgf = (unsigned long)table;
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   pgf |= shift;
+   tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+   pgtable_free(table, shift);
+}
+#else
+static inline void pgtable_free_tlb(struct mmu_gather *tlb,
+   void *table, int shift)
+{
+   pgtable_free(table, shift);
+}
+#endif
+
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
+ unsigned long address)
+{
+   tlb_flush_pgtable(tlb, address);
+   pgtable_page_dtor(table);
+   pgtable_free_tlb(tlb, page_address(table), 0);
+}
+#endif /* _ASM_POWERPC_PGALLOC_32_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
new file mode 100644
index ..8d5fc3ac43da
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -0,0 +1,266 @@
+#ifndef _ASM_POWERPC_PGALLOC_64_H
+#define _ASM_POWERPC_PGALLOC_64_H
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+struct vmemmap_backing {
+   struct vmemmap_backing *list;
+   unsigned long phys;
+   unsigned long virt_addr;
+};
+

[PATCH V4 06/18] powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table

2016-02-22 Thread Aneesh Kumar K.V
This is needed so that we can support both hash and radix page table
using single kernel. Radix kernel uses a 4 level table.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 33 +--
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 20 +---
 arch/powerpc/include/asm/book3s/64/hash.h | 11 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 25 +++-
 arch/powerpc/include/asm/pgalloc-64.h | 28 ---
 arch/powerpc/include/asm/pgtable-types.h  | 13 +++
 arch/powerpc/mm/init_64.c | 21 -
 8 files changed, 97 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9faa18c4f3f7..599329332613 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -303,6 +303,7 @@ config ZONE_DMA32
 config PGTABLE_LEVELS
int
default 2 if !PPC64
+   default 4 if PPC_BOOK3S_64
default 3 if PPC_64K_PAGES
default 4
 
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 7f60f7e814d4..5f08a0832238 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -58,39 +58,8 @@
 #define _PAGE_4K_PFN   0
 #ifndef __ASSEMBLY__
 /*
- * 4-level page tables related bits
+ * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range()
  */
-
-#define pgd_none(pgd)  (!pgd_val(pgd))
-#define pgd_bad(pgd)   (pgd_val(pgd) == 0)
-#define pgd_present(pgd)   (pgd_val(pgd) != 0)
-#define pgd_page_vaddr(pgd)__va(pgd_val(pgd) & ~PGD_MASKED_BITS)
-
-static inline void pgd_clear(pgd_t *pgdp)
-{
-   *pgdp = __pgd(0);
-}
-
-static inline pte_t pgd_pte(pgd_t pgd)
-{
-   return __pte(pgd_val(pgd));
-}
-
-static inline pgd_t pte_pgd(pte_t pte)
-{
-   return __pgd(pte_val(pte));
-}
-extern struct page *pgd_page(pgd_t pgd);
-
-#define pud_offset(pgdp, addr) \
-  (((pud_t *) pgd_page_vaddr(*(pgdp))) + \
-(((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)))
-
-#define pud_ERROR(e) \
-   pr_err("%s:%d: bad pud %08lx.\n", __FILE__, __LINE__, pud_val(e))
-
-/*
- * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range() */
 #define remap_4k_pfn(vma, addr, pfn, prot) \
remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, (prot))
 
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 8bb03251f34c..f0f5f91d7909 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -1,15 +1,14 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_HASH_64K_H
 #define _ASM_POWERPC_BOOK3S_64_HASH_64K_H
 
-#include 
-
 #define PTE_INDEX_SIZE  8
-#define PMD_INDEX_SIZE  10
-#define PUD_INDEX_SIZE 0
+#define PMD_INDEX_SIZE  5
+#define PUD_INDEX_SIZE 5
 #define PGD_INDEX_SIZE  12
 
 #define PTRS_PER_PTE   (1 << PTE_INDEX_SIZE)
 #define PTRS_PER_PMD   (1 << PMD_INDEX_SIZE)
+#define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
 #define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
 
 /* With 4k base page size, hugepage PTEs go at the PMD level */
@@ -20,8 +19,13 @@
 #define PMD_SIZE   (1UL << PMD_SHIFT)
 #define PMD_MASK   (~(PMD_SIZE-1))
 
+/* PUD_SHIFT determines what a third-level page table entry can map */
+#define PUD_SHIFT  (PMD_SHIFT + PMD_INDEX_SIZE)
+#define PUD_SIZE   (1UL << PUD_SHIFT)
+#define PUD_MASK   (~(PUD_SIZE-1))
+
 /* PGDIR_SHIFT determines what a third-level page table entry can map */
-#define PGDIR_SHIFT(PMD_SHIFT + PMD_INDEX_SIZE)
+#define PGDIR_SHIFT(PUD_SHIFT + PUD_INDEX_SIZE)
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
@@ -63,6 +67,8 @@
 #define PMD_MASKED_BITS(PTE_FRAG_SIZE - 1)
 /* Bits to mask out from a PGD/PUD to get to the PMD page */
 #define PUD_MASKED_BITS0x1ff
+/* FIXME!! Will be fixed in next patch */
+#define PGD_MASKED_BITS0
 
 #ifndef __ASSEMBLY__
 
@@ -132,11 +138,9 @@ extern bool __rpte_sub_valid(real_pte_t rpte, unsigned 
long index);
 #else
 #define PMD_TABLE_SIZE (sizeof(pmd_t) << PMD_INDEX_SIZE)
 #endif
+#define PUD_TABLE_SIZE (sizeof(pud_t) << PUD_INDEX_SIZE)
 #define PGD_TABLE_SIZE (sizeof(pgd_t) << PGD_INDEX_SIZE)
 
-#define pgd_pte(pgd)   (pud_pte(((pud_t){ pgd })))
-#define pte_pgd(pte)   ((pgd_t)pte_pud(pte))
-
 #ifdef CONFIG_HUGETLB_PAGE
 /*
  * We have PGD_INDEX_SIZ = 12 and PTE_INDEX_SIZE = 8, so that we can have
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index ef9bd68f7e6d..d0ee6fcef823 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -235,6 +235,7 @@
 #define __pgtable_ptr_val(ptr) __pa(ptr)
 
 #define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_

[PATCH V4 05/18] powerpc/mm: Don't have conditional defines for real_pte_t

2016-02-22 Thread Aneesh Kumar K.V
We remove real_pte_t out of STRICT_MM_TYPESCHECK.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  5 -
 arch/powerpc/include/asm/pgtable-types.h | 26 +-
 2 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index c8240b737d11..7482f69117b6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -43,13 +43,8 @@
  */
 #ifndef __real_pte
 
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
 #define __real_pte(e,p)((real_pte_t){(e)})
 #define __rpte_to_pte(r)   ((r).pte)
-#else
-#define __real_pte(e,p)(e)
-#define __rpte_to_pte(r)   (__pte(r))
-#endif
 #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) 
>>_PAGE_F_GIX_SHIFT)
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)   \
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
index 2fac0c4acfa4..71487e1ca638 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -12,15 +12,6 @@ static inline pte_basic_t pte_val(pte_t x)
return x.pte;
 }
 
-/* 64k pages additionally define a bigger "real PTE" type that gathers
- * the "second half" part of the PTE for pseudo 64k pages
- */
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef struct { pte_t pte; } real_pte_t;
-#endif
-
 /* PMD level */
 #ifdef CONFIG_PPC64
 typedef struct { unsigned long pmd; } pmd_t;
@@ -67,13 +58,6 @@ static inline pte_basic_t pte_val(pte_t pte)
return pte;
 }
 
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef pte_t real_pte_t;
-#endif
-
-
 #ifdef CONFIG_PPC64
 typedef unsigned long pmd_t;
 #define __pmd(x)   (x)
@@ -103,6 +87,14 @@ typedef unsigned long pgprot_t;
 #define pgprot_val(x)  (x)
 #define __pgprot(x)(x)
 
+#endif /* CONFIG_STRICT_MM_TYPECHECKS */
+/*
+ * With hash config 64k pages additionally define a bigger "real PTE" type that
+ * gathers the "second half" part of the PTE for pseudo 64k pages
+ */
+#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
+typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
+#else
+typedef struct { pte_t pte; } real_pte_t;
 #endif
-
 #endif /* _ASM_POWERPC_PGTABLE_TYPES_H */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 04/18] powerpc/mm: Split pgtable types to separate header

2016-02-22 Thread Aneesh Kumar K.V
We move the page table accessors into a separate header. We will
later add a big endian variant of the table which is needed for radix.
No functionality change only code movement.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/page.h  | 104 +
 arch/powerpc/include/asm/pgtable-types.h | 108 +++
 2 files changed, 109 insertions(+), 103 deletions(-)
 create mode 100644 arch/powerpc/include/asm/pgtable-types.h

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index af7a3422a3ef..ab3d8977bacd 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -288,109 +288,7 @@ extern long long virt_phys_offset;
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
-/* These are used to make use of C type-checking. */
-
-/* PTE level */
-typedef struct { pte_basic_t pte; } pte_t;
-#define __pte(x)   ((pte_t) { (x) })
-static inline pte_basic_t pte_val(pte_t x)
-{
-   return x.pte;
-}
-
-/* 64k pages additionally define a bigger "real PTE" type that gathers
- * the "second half" part of the PTE for pseudo 64k pages
- */
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef struct { pte_t pte; } real_pte_t;
-#endif
-
-/* PMD level */
-#ifdef CONFIG_PPC64
-typedef struct { unsigned long pmd; } pmd_t;
-#define __pmd(x)   ((pmd_t) { (x) })
-static inline unsigned long pmd_val(pmd_t x)
-{
-   return x.pmd;
-}
-
-/* PUD level exusts only on 4k pages */
-#ifndef CONFIG_PPC_64K_PAGES
-typedef struct { unsigned long pud; } pud_t;
-#define __pud(x)   ((pud_t) { (x) })
-static inline unsigned long pud_val(pud_t x)
-{
-   return x.pud;
-}
-#endif /* !CONFIG_PPC_64K_PAGES */
-#endif /* CONFIG_PPC64 */
-
-/* PGD level */
-typedef struct { unsigned long pgd; } pgd_t;
-#define __pgd(x)   ((pgd_t) { (x) })
-static inline unsigned long pgd_val(pgd_t x)
-{
-   return x.pgd;
-}
-
-/* Page protection bits */
-typedef struct { unsigned long pgprot; } pgprot_t;
-#define pgprot_val(x)  ((x).pgprot)
-#define __pgprot(x)((pgprot_t) { (x) })
-
-#else
-
-/*
- * .. while these make it easier on the compiler
- */
-
-typedef pte_basic_t pte_t;
-#define __pte(x)   (x)
-static inline pte_basic_t pte_val(pte_t pte)
-{
-   return pte;
-}
-
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef pte_t real_pte_t;
-#endif
-
-
-#ifdef CONFIG_PPC64
-typedef unsigned long pmd_t;
-#define __pmd(x)   (x)
-static inline unsigned long pmd_val(pmd_t pmd)
-{
-   return pmd;
-}
-
-#ifndef CONFIG_PPC_64K_PAGES
-typedef unsigned long pud_t;
-#define __pud(x)   (x)
-static inline unsigned long pud_val(pud_t pud)
-{
-   return pud;
-}
-#endif /* !CONFIG_PPC_64K_PAGES */
-#endif /* CONFIG_PPC64 */
-
-typedef unsigned long pgd_t;
-#define __pgd(x)   (x)
-static inline unsigned long pgd_val(pgd_t pgd)
-{
-   return pgd;
-}
-
-typedef unsigned long pgprot_t;
-#define pgprot_val(x)  (x)
-#define __pgprot(x)(x)
-
-#endif
+#include 
 
 typedef struct { signed long pd; } hugepd_t;
 
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
new file mode 100644
index ..2fac0c4acfa4
--- /dev/null
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -0,0 +1,108 @@
+#ifndef _ASM_POWERPC_PGTABLE_TYPES_H
+#define _ASM_POWERPC_PGTABLE_TYPES_H
+
+#ifdef CONFIG_STRICT_MM_TYPECHECKS
+/* These are used to make use of C type-checking. */
+
+/* PTE level */
+typedef struct { pte_basic_t pte; } pte_t;
+#define __pte(x)   ((pte_t) { (x) })
+static inline pte_basic_t pte_val(pte_t x)
+{
+   return x.pte;
+}
+
+/* 64k pages additionally define a bigger "real PTE" type that gathers
+ * the "second half" part of the PTE for pseudo 64k pages
+ */
+#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
+typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
+#else
+typedef struct { pte_t pte; } real_pte_t;
+#endif
+
+/* PMD level */
+#ifdef CONFIG_PPC64
+typedef struct { unsigned long pmd; } pmd_t;
+#define __pmd(x)   ((pmd_t) { (x) })
+static inline unsigned long pmd_val(pmd_t x)
+{
+   return x.pmd;
+}
+
+/* PUD level exusts only on 4k pages */
+#ifndef CONFIG_PPC_64K_PAGES
+typedef struct { unsigned long pud; } pud_t;
+#define __pud(x)   ((pud_t) { (x) })
+static inline unsigned long pud_val(pud_t x)
+{
+   return x.pud;
+}
+#endif /* !CONFIG_PPC_64K_PAGES */
+#endif /* CONFIG_PPC64 */
+
+/* PGD level */
+typedef struct { unsigned long pgd; } pgd_t;
+#define __pgd(x)   ((pgd_t) { (x) })
+static inline unsigned long pgd_val(pgd_t x)
+{
+   return x.pgd;
+}
+
+/* Page protection bits */
+typedef struct { unsigned long pgprot; } pgprot_t;
+#define pgprot_val(x)  ((x).pgprot)
+#define __pgprot(x)((pgp

[PATCH V4 03/18] powerpc/mm: add _PAGE_HASHPTE similar to 4K hash

2016-02-22 Thread Aneesh Kumar K.V
The difference between 64K and 4K hash fault handling is confusing
with respect to when we set _PAGE_HASHPTE in the linux pte.
I was trying to find out whether we miss a hpte flush in any
scenario because of this. ie, a pte update on a linux pte, for which we
are doing a parallel hash pte insert. After looking at it closer my
understanding is this won't happen because pte update also look at
_PAGE_BUSY and we will wait for hash pte insert to finish before going
ahead with the pte update. But to avoid further confusion keep the
hash fault handler for all the page size similar to  __hash_page_4k.

This partially reverts commit 41743a4e34f0 ("powerpc: Free a PTE bit on ppc64 
with 64K pages"

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash64_64k.c | 4 ++--
 arch/powerpc/mm/hugepage-hash64.c| 2 +-
 arch/powerpc/mm/hugetlbpage-hash64.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index b2d659cf51c6..507c1e55a424 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -76,7 +76,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * a write access. Since this is 4K insert of 64K page size
 * also add _PAGE_COMBO
 */
-   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_COMBO;
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_COMBO | 
_PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pte |= _PAGE_DIRTY;
} while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
@@ -251,7 +251,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 * Try to lock the PTE, add ACCESSED and DIRTY if it was
 * a write access.
 */
-   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pte |= _PAGE_DIRTY;
} while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index eb2accdd76fd..56d677b7972c 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -46,7 +46,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * Try to lock the PTE, add ACCESSED and DIRTY if it was
 * a write access
 */
-   new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED;
+   new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pmd |= _PAGE_DIRTY;
} while (old_pmd != __cmpxchg_u64((unsigned long *)pmdp,
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index 8555fce902fe..08efcad7cae0 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -54,7 +54,7 @@ int __hash_page_huge(unsigned long ea, unsigned long access, 
unsigned long vsid,
return 1;
/* Try to lock the PTE, add ACCESSED and DIRTY if it was
 * a write access */
-   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
if (access & _PAGE_RW)
new_pte |= _PAGE_DIRTY;
} while(old_pte != __cmpxchg_u64((unsigned long *)ptep,
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 02/18] mm: Some arch may want to use HPAGE_PMD related values as variables

2016-02-22 Thread Aneesh Kumar K.V
From: "Kirill A. Shutemov" 

With next generation power processor, we are having a new mmu model
[1] that require us to maintain a different linux page table format.

Inorder to support both current and future ppc64 systems with a single
kernel we need to make sure kernel can select between different page
table format at runtime. With the new MMU (radix MMU) added, we will
have two different pmd hugepage size 16MB for hash model and 2MB for
Radix model. Hence make HPAGE_PMD related values as a variable.

Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a
followup patch.

[1] http://ibm.biz/power-isa3 (Needs registration).

Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable_64.c |  7 +++
 include/linux/bug.h  |  9 +
 include/linux/huge_mm.h  |  3 ---
 mm/huge_memory.c | 17 ++---
 4 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index af304e6d5a89..0eb53128ca2a 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -817,6 +817,13 @@ pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 
 int has_transparent_hugepage(void)
 {
+
+   BUILD_BUG_ON_MSG((PMD_SHIFT - PAGE_SHIFT) >= MAX_ORDER,
+   "hugepages can't be allocated by the buddy allocator");
+
+   BUILD_BUG_ON_MSG((PMD_SHIFT - PAGE_SHIFT) < 2,
+"We need more than 2 pages to do deferred thp split");
+
if (!mmu_has_feature(MMU_FTR_16M_PAGE))
return 0;
/*
diff --git a/include/linux/bug.h b/include/linux/bug.h
index 7f4818673c41..e51b0709e78d 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -20,6 +20,7 @@ struct pt_regs;
 #define BUILD_BUG_ON_MSG(cond, msg) (0)
 #define BUILD_BUG_ON(condition) (0)
 #define BUILD_BUG() (0)
+#define MAYBE_BUILD_BUG_ON(cond) (0)
 #else /* __CHECKER__ */
 
 /* Force a compilation error if a constant expression is not a power of 2 */
@@ -83,6 +84,14 @@ struct pt_regs;
  */
 #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
 
+#define MAYBE_BUILD_BUG_ON(cond)   \
+   do {\
+   if (__builtin_constant_p((cond)))   \
+   BUILD_BUG_ON(cond); \
+   else\
+   BUG_ON(cond);   \
+   } while (0)
+
 #endif /* __CHECKER__ */
 
 #ifdef CONFIG_GENERIC_BUG
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 459fd25b378e..f12513a20a06 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -111,9 +111,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
__split_huge_pmd(__vma, __pmd, __address);  \
}  while (0)
 
-#if HPAGE_PMD_ORDER >= MAX_ORDER
-#error "hugepages can't be allocated by the buddy allocator"
-#endif
 extern int hugepage_madvise(struct vm_area_struct *vma,
unsigned long *vm_flags, int advice);
 extern void vma_adjust_trans_huge(struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1c317b85ea7d..0f4ad6374131 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -83,7 +83,7 @@ unsigned long transparent_hugepage_flags __read_mostly =
(1<= MAX_ORDER);
+   /*
+* we use page->mapping and page->index in second tail page
+* as list_head: assuming THP order >= 2
+*/
+   MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER < 2);
+
err = hugepage_init_sysfs(&hugepage_kobj);
if (err)
goto err_sysfs;
@@ -764,7 +776,6 @@ void prep_transhuge_page(struct page *page)
 * we use page->mapping and page->indexlru in second tail page
 * as list_head: assuming THP order >= 2
 */
-   BUILD_BUG_ON(HPAGE_PMD_ORDER < 2);
 
INIT_LIST_HEAD(page_deferred_list(page));
set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 01/18] powerp/mm: Update code comments

2016-02-22 Thread Aneesh Kumar K.V
We are updating pte in those functions.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash64_4k.c  | 2 +-
 arch/powerpc/mm/hash64_64k.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index e7c04542ba62..e3e76b929f33 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -106,7 +106,7 @@ repeat:
}
}
/*
-* Hypervisor failure. Restore old pmd and return -1
+* Hypervisor failure. Restore old pte and return -1
 * similar to __hash_page_*
 */
if (unlikely(slot == -2)) {
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index ef6fac6d773c..b2d659cf51c6 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -188,7 +188,7 @@ repeat:
}
}
/*
-* Hypervisor failure. Restore old pmd and return -1
+* Hypervisor failure. Restore old pte and return -1
 * similar to __hash_page_*
 */
if (unlikely(slot == -2)) {
@@ -310,7 +310,7 @@ repeat:
}
}
/*
-* Hypervisor failure. Restore old pmd and return -1
+* Hypervisor failure. Restore old pte and return -1
 * similar to __hash_page_*
 */
if (unlikely(slot == -2)) {
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V4 00/18] Book3s abstraction in preparation for new MMU model

2016-02-22 Thread Aneesh Kumar K.V

Hello,

This series mostly consisting of code movement. One new thing added in this 
series
is to switch book3s 64 to 4 level page table. The changes are done to accomodate
the upcoming new memory model in future powerpc chips. The details of the new
MMU model can be found at

 http://ibm.biz/power-isa3 (Needs registration). I am including a summary of 
the changes below.

ISA 3.0 adds support for the radix tree style of MMU with full
virtualization and related control mechanisms that manage its
coexistence with the HPT. Radix-using operating systems will
manage their own translation tables instead of relying on hcalls.

Radix style MMU model requires us to do a 4 level page table
with 64K and 4K page size. The table index size different page size
is listed below

PGD -> 13 bits
PUD -> 9 (1G hugepage)
PMD -> 9 (2M huge page)
PTE -> 5 (for 64k), 9 (for 4k)

We also require the page table to be in big endian format.

Changes from V3:
 * rebase on top of PTE bits movement patch series
 * Drop all the hash linux abstraction patches
 * Keep only 4 level table and other code movement patches.

Changes from V2:
 * rebase to latest kernel
 * Update commit messages
 * address review comments

Changes from V1:
* move patches adding helpers to the next series


NOTE:
 This is lightly tested. Right now 4K linux page size is what is being tested. 
Once that is done
 I will have to do 64K linux page size tests.

-aneesh

Aneesh Kumar K.V (17):
  powerp/mm: Update code comments
  powerpc/mm: add _PAGE_HASHPTE similar to 4K hash
  powerpc/mm: Split pgtable types to separate header
  powerpc/mm: Don't have conditional defines for real_pte_t
  powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table
  powerpc/mm: Update masked bits for linux page table
  powerpc/mm: Copy pgalloc (part 1)
  powerpc/mm: Copy pgalloc (part 2)
  powerpc/mm: Copy pgalloc (part 3)
  powerpc/mm: Hugetlbfs is book3s_64 and fsl_book3e (32 or 64)
  powerpc/mm: Use flush_tlb_page in ptep_clear_flush_young
  powerpc/mm: Move hash related mmu-*.h headers to book3s/
  powerpc/mm: Create a new headers for tlbflush for hash64
  powerpc/mm: Move hash page table related functions to pgtable-hash64.c
  powerpc/mm: THP is only available on hash64 as of now
  powerpc/mm: Use generic version of pmdp_clear_flush_young
  powerpc/mm: Move hash64 specific definitions to separate header

Kirill A. Shutemov (1):
  mm: Some arch may want to use HPAGE_PMD related values as variables

 arch/powerpc/Kconfig   |   1 +
 .../asm/{mmu-hash32.h => book3s/32/mmu-hash.h} |   0
 arch/powerpc/include/asm/book3s/32/pgalloc.h   | 109 
 arch/powerpc/include/asm/book3s/64/hash-4k.h   |  33 +-
 arch/powerpc/include/asm/book3s/64/hash-64k.h  |  31 +-
 arch/powerpc/include/asm/book3s/64/hash.h  |  58 +-
 .../asm/{mmu-hash64.h => book3s/64/mmu-hash.h} |   2 +-
 .../include/asm/book3s/64/pgalloc-hash-4k.h|  92 +++
 .../include/asm/book3s/64/pgalloc-hash-64k.h   |  51 ++
 arch/powerpc/include/asm/book3s/64/pgalloc-hash.h  |  59 ++
 arch/powerpc/include/asm/book3s/64/pgalloc.h   |  69 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h   |  72 ++-
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h |  94 
 arch/powerpc/include/asm/book3s/pgalloc.h  |  19 +
 arch/powerpc/include/asm/hugetlb.h |   1 +
 arch/powerpc/include/asm/mmu.h |   4 +-
 .../asm/{pgalloc-32.h => nohash/32/pgalloc.h}  |   0
 .../asm/{pgalloc-64.h => nohash/64/pgalloc.h}  |  28 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h   |   3 +
 arch/powerpc/include/asm/nohash/pgalloc.h  |  23 +
 arch/powerpc/include/asm/page.h| 104 +---
 arch/powerpc/include/asm/pgalloc.h |  19 +-
 arch/powerpc/include/asm/pgtable-types.h   | 103 
 arch/powerpc/include/asm/tlbflush.h|  92 +--
 arch/powerpc/kernel/idle_power7.S  |   2 +-
 arch/powerpc/kvm/book3s_32_mmu_host.c  |   2 +-
 arch/powerpc/kvm/book3s_64_mmu.c   |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c  |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c|   2 +-
 arch/powerpc/kvm/book3s_64_vio.c   |   2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c|   2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c|   2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S|   2 +-
 arch/powerpc/mm/Makefile   |   3 +-
 arch/powerpc/mm/hash64_4k.c|   2 +-
 arch/powerpc/mm/hash64_64k.c   |   8 +-
 arch/powerpc/mm/hugepage-hash64.c  |   2 +-
 arch/powerpc/mm/hugetlbpage-book3e.c   | 293 ++
 arch/powerpc/mm/hugetlbpage-hash64.c   | 122 +++-
 arch/powerpc/mm/hugetlbpage.c  | 401 +-
 arch/powerpc/mm/init_64.c  | 105 

[PATCH v5 9/9] powerpc: Add the ability to save VSX without giving it up

2016-02-22 Thread Cyril Bur
This patch adds the ability to be able to save the VSX registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU and VEC registers
in the thread copy path to avoid a possibly pointless reload of VSX state.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/switch_to.h |  4 
 arch/powerpc/kernel/ppc_ksyms.c  |  4 
 arch/powerpc/kernel/process.c| 42 +---
 arch/powerpc/kernel/vector.S | 17 ---
 4 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 9028822..17c8380 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -56,14 +56,10 @@ static inline void __giveup_altivec(struct task_struct *t) 
{ }
 #ifdef CONFIG_VSX
 extern void enable_kernel_vsx(void);
 extern void flush_vsx_to_thread(struct task_struct *);
-extern void giveup_vsx(struct task_struct *);
-extern void __giveup_vsx(struct task_struct *);
 static inline void disable_kernel_vsx(void)
 {
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
 }
-#else
-static inline void __giveup_vsx(struct task_struct *t) { }
 #endif
 
 #ifdef CONFIG_SPE
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 41e1607..ef7024da 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state);
 EXPORT_SYMBOL(store_vr_state);
 #endif
 
-#ifdef CONFIG_VSX
-EXPORT_SYMBOL_GPL(__giveup_vsx);
-#endif
-
 #ifdef CONFIG_EPAPR_PARAVIRT
 EXPORT_SYMBOL(epapr_hypercall_start);
 #endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 14c09d2..d7a9df5 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -280,19 +280,31 @@ static inline int restore_altivec(struct task_struct 
*tsk) { return 0; }
 #endif /* CONFIG_ALTIVEC */
 
 #ifdef CONFIG_VSX
-void giveup_vsx(struct task_struct *tsk)
+static void __giveup_vsx(struct task_struct *tsk)
 {
-   check_if_tm_restore_required(tsk);
-
-   msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
if (tsk->thread.regs->msr & MSR_FP)
__giveup_fpu(tsk);
if (tsk->thread.regs->msr & MSR_VEC)
__giveup_altivec(tsk);
+   tsk->thread.regs->msr &= ~MSR_VSX;
+}
+
+static void giveup_vsx(struct task_struct *tsk)
+{
+   check_if_tm_restore_required(tsk);
+
+   msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
__giveup_vsx(tsk);
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
 }
-EXPORT_SYMBOL(giveup_vsx);
+
+static void save_vsx(struct task_struct *tsk)
+{
+   if (tsk->thread.regs->msr & MSR_FP)
+   save_fpu(tsk);
+   if (tsk->thread.regs->msr & MSR_VEC)
+   save_altivec(tsk);
+}
 
 void enable_kernel_vsx(void)
 {
@@ -335,6 +347,7 @@ static int restore_vsx(struct task_struct *tsk)
 }
 #else
 static inline int restore_vsx(struct task_struct *tsk) { return 0; }
+static inline void save_vsx(struct task_struct *tsk) { }
 #endif /* CONFIG_VSX */
 
 #ifdef CONFIG_SPE
@@ -478,14 +491,19 @@ void save_all(struct task_struct *tsk)
 
msr_check_and_set(msr_all_available);
 
-   if (usermsr & MSR_FP)
-   save_fpu(tsk);
-
-   if (usermsr & MSR_VEC)
-   save_altivec(tsk);
+   /*
+* Saving the way the register space is in hardware, save_vsx boils
+* down to a save_fpu() and save_altivec()
+*/
+   if (usermsr & MSR_VSX) {
+   save_vsx(tsk);
+   } else {
+   if (usermsr & MSR_FP)
+   save_fpu(tsk);
 
-   if (usermsr & MSR_VSX)
-   __giveup_vsx(tsk);
+   if (usermsr & MSR_VEC)
+   save_altivec(tsk);
+   }
 
if (usermsr & MSR_SPE)
__giveup_spe(tsk);
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 51b0c17..1c2e7a3 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx)
std r12,_MSR(r1)
b   fast_exception_return
 
-/*
- * __giveup_vsx(tsk)
- * Disable VSX for the task given as the argument.
- * Does NOT save vsx registers.
- */
-_GLOBAL(__giveup_vsx)
-   addir3,r3,THREAD/* want THREAD of task */
-   ld  r5,PT_REGS(r3)
-   cmpdi   0,r5,0
-   beq 1f
-   ld  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-   lis r3,MSR_VSX@h
-   andcr4,r4,r3/* disable VSX for previous task */
-   std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
-   blr
-
 #endif /* CONFIG_VSX */
 
 
-- 
2.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 2/9] selftests/powerpc: Test preservation of FPU and VMX regs across preemption

2016-02-22 Thread Cyril Bur
Loop in assembly checking the registers with many threads.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/math/.gitignore|   2 +
 tools/testing/selftests/powerpc/math/Makefile  |   5 +-
 tools/testing/selftests/powerpc/math/fpu_asm.S |  36 +++
 tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 +
 tools/testing/selftests/powerpc/math/vmx_asm.S |  43 +++-
 tools/testing/selftests/powerpc/math/vmx_preempt.c | 112 
 6 files changed, 308 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c

diff --git a/tools/testing/selftests/powerpc/math/.gitignore 
b/tools/testing/selftests/powerpc/math/.gitignore
index b19b269..1a6f09e 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -1,2 +1,4 @@
 fpu_syscall
 vmx_syscall
+fpu_preempt
+vmx_preempt
diff --git a/tools/testing/selftests/powerpc/math/Makefile 
b/tools/testing/selftests/powerpc/math/Makefile
index 598e5df..5f5617c 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall vmx_syscall
+TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
 
 all: $(TEST_PROGS)
 
@@ -8,7 +8,10 @@ $(TEST_PROGS): ../harness.c
 $(TEST_PROGS): CFLAGS = $(filter-out -flto,$(CFLAGS) -O2 -g -pthread -m64 
-maltivec)
 
 fpu_syscall: fpu_asm.S
+fpu_preempt: fpu_asm.S
+
 vmx_syscall: vmx_asm.S
+vmx_preempt: vmx_asm.S
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S 
b/tools/testing/selftests/powerpc/math/fpu_asm.S
index b12c051..6d9ac4d4 100644
--- a/tools/testing/selftests/powerpc/math/fpu_asm.S
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -159,3 +159,39 @@ FUNC_START(test_fpu)
POP_BASIC_STACK(256)
blr
 FUNC_END(test_fpu)
+
+#int preempt_fpu(double *darray, int *threads_running, int *running)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with darray. Will proceed to check the validity of the FPU
+#registers while running is not zero.
+FUNC_START(preempt_fpu)
+   PUSH_BASIC_STACK(256)
+   std r3,STACK_FRAME_PARAM(0)(sp) #double *darray
+   std r4,STACK_FRAME_PARAM(1)(sp) #int *threads_starting
+   std r5,STACK_FRAME_PARAM(2)(sp) #int *running
+   PUSH_FPU(STACK_FRAME_LOCAL(3,0))
+
+   bl load_fpu
+   nop
+
+   #Atomic DEC
+   ld r3,STACK_FRAME_PARAM(1)(sp)
+1: lwarx r4,0,r3
+   addi r4,r4,-1
+   stwcx. r4,0,r3
+   bne- 1b
+
+2: ld r3,STACK_FRAME_PARAM(0)(sp)
+   bl check_fpu
+   nop
+   cmpdi r3,0
+   bne 3f
+   ld r4,STACK_FRAME_PARAM(2)(sp)
+   ld r5,0(r4)
+   cmpwi r5,0
+   bne 2b
+
+3: POP_FPU(STACK_FRAME_LOCAL(3,0))
+   POP_BASIC_STACK(256)
+   blr
+FUNC_END(preempt_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_preempt.c 
b/tools/testing/selftests/powerpc/math/fpu_preempt.c
new file mode 100644
index 000..0f85b79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_preempt.c
@@ -0,0 +1,113 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers change across preemption.
+ * Two things should be noted here a) The check_fpu function in asm only checks
+ * the non volatile registers as it is reused from the syscall test b) There is
+ * no way to be sure preemption happened so this test just uses many threads
+ * and a long wait. As such, a successful test doesn't mean much but a failure
+ * is bad.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+2.1};
+
+int threads_starting;
+int running;
+
+extern void preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void *preempt_fpu_c(void *p)
+{
+   int i;
+   srand(pthread_self());
+   for (i = 0; i < 21; i++)
+   darray[i] = rand();
+
+   /* Test failed if it ever returns */
+   preempt_fpu(darray, &threads_starting, &running);
+
+   return p;
+}
+
+int test_preempt_fpu(void)
+{
+   int i, rc, threads;
+   pthread_t *tids;
+
+ 

[PATCH v5 7/9] powerpc: Add the ability to save FPU without giving it up

2016-02-22 Thread Cyril Bur
This patch adds the ability to be able to save the FPU registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch optimises the thread copy path (as a result of a fork() or
clone()) so that the parent thread can return to userspace with hot
registers avoiding a possibly pointless reload of FPU register state.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/switch_to.h |  3 ++-
 arch/powerpc/kernel/fpu.S| 21 -
 arch/powerpc/kernel/process.c| 12 +++-
 3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 3690041..6a201e8 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -28,13 +28,14 @@ extern void giveup_all(struct task_struct *);
 extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
 extern void giveup_fpu(struct task_struct *);
-extern void __giveup_fpu(struct task_struct *);
+extern void save_fpu(struct task_struct *);
 static inline void disable_kernel_fp(void)
 {
msr_check_and_clear(MSR_FP);
 }
 #else
 static inline void __giveup_fpu(struct task_struct *t) { }
+static inline void save_fpu(struct task_struct *t) { }
 static inline void flush_fp_to_thread(struct task_struct *t) { }
 #endif
 
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index b063524..15da2b5 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
blr
 
 /*
- * __giveup_fpu(tsk)
- * Disable FP for the task given as the argument,
- * and save the floating-point registers in its thread_struct.
+ * save_fpu(tsk)
+ * Save the floating-point registers in its thread_struct.
  * Enables the FPU for use in the kernel on return.
  */
-_GLOBAL(__giveup_fpu)
+_GLOBAL(save_fpu)
addir3,r3,THREAD/* want THREAD of task */
PPC_LL  r6,THREAD_FPSAVEAREA(r3)
PPC_LL  r5,PT_REGS(r3)
PPC_LCMPI   0,r6,0
bne 2f
addir6,r3,THREAD_FPSTATE
-2: PPC_LCMPI   0,r5,0
-   SAVE_32FPVSRS(0, R4, R6)
+2: SAVE_32FPVSRS(0, R4, R6)
mffsfr0
stfdfr0,FPSTATE_FPSCR(r6)
-   beq 1f
-   PPC_LL  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-   li  r3,MSR_FP|MSR_FE0|MSR_FE1
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-   orisr3,r3,MSR_VSX@h
-END_FTR_SECTION_IFSET(CPU_FTR_VSX)
-#endif
-   andcr4,r4,r3/* disable FP for previous task */
-   PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
 
 /*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 29da07f..a7e5061 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
 EXPORT_SYMBOL(__msr_check_and_clear);
 
 #ifdef CONFIG_PPC_FPU
+void __giveup_fpu(struct task_struct *tsk)
+{
+   save_fpu(tsk);
+   tsk->thread.regs->msr &= ~MSR_FP;
+#ifdef CONFIG_VSX
+   if (cpu_has_feature(CPU_FTR_VSX))
+   tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
 void giveup_fpu(struct task_struct *tsk)
 {
check_if_tm_restore_required(tsk);
@@ -459,7 +469,7 @@ void save_all(struct task_struct *tsk)
msr_check_and_set(msr_all_available);
 
if (usermsr & MSR_FP)
-   __giveup_fpu(tsk);
+   save_fpu(tsk);
 
if (usermsr & MSR_VEC)
__giveup_altivec(tsk);
-- 
2.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 8/9] powerpc: Add the ability to save Altivec without giving it up

2016-02-22 Thread Cyril Bur
This patch adds the ability to be able to save the VEC registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU registers in the
thread copy path to avoid a possibly pointless reload of VEC state.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/switch_to.h |  3 ++-
 arch/powerpc/kernel/process.c| 12 +++-
 arch/powerpc/kernel/vector.S | 24 
 3 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 6a201e8..9028822 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -43,12 +43,13 @@ static inline void flush_fp_to_thread(struct task_struct 
*t) { }
 extern void enable_kernel_altivec(void);
 extern void flush_altivec_to_thread(struct task_struct *);
 extern void giveup_altivec(struct task_struct *);
-extern void __giveup_altivec(struct task_struct *);
+extern void save_altivec(struct task_struct *);
 static inline void disable_kernel_altivec(void)
 {
msr_check_and_clear(MSR_VEC);
 }
 #else
+static inline void save_altivec(struct task_struct *t) { }
 static inline void __giveup_altivec(struct task_struct *t) { }
 #endif
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a7e5061..14c09d2 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -213,6 +213,16 @@ static int restore_fp(struct task_struct *tsk) { return 0; 
}
 #ifdef CONFIG_ALTIVEC
 #define loadvec(thr) ((thr).load_vec)
 
+static void __giveup_altivec(struct task_struct *tsk)
+{
+   save_altivec(tsk);
+   tsk->thread.regs->msr &= ~MSR_VEC;
+#ifdef CONFIG_VSX
+   if (cpu_has_feature(CPU_FTR_VSX))
+   tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
 void giveup_altivec(struct task_struct *tsk)
 {
check_if_tm_restore_required(tsk);
@@ -472,7 +482,7 @@ void save_all(struct task_struct *tsk)
save_fpu(tsk);
 
if (usermsr & MSR_VEC)
-   __giveup_altivec(tsk);
+   save_altivec(tsk);
 
if (usermsr & MSR_VSX)
__giveup_vsx(tsk);
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 038cff8..51b0c17 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -106,36 +106,20 @@ _GLOBAL(load_up_altivec)
blr
 
 /*
- * __giveup_altivec(tsk)
- * Disable VMX for the task given as the argument,
- * and save the vector registers in its thread_struct.
+ * save_altivec(tsk)
+ * Save the vector registers to its thread_struct
  */
-_GLOBAL(__giveup_altivec)
+_GLOBAL(save_altivec)
addir3,r3,THREAD/* want THREAD of task */
PPC_LL  r7,THREAD_VRSAVEAREA(r3)
PPC_LL  r5,PT_REGS(r3)
PPC_LCMPI   0,r7,0
bne 2f
addir7,r3,THREAD_VRSTATE
-2: PPC_LCMPI   0,r5,0
-   SAVE_32VRS(0,r4,r7)
+2: SAVE_32VRS(0,r4,r7)
mfvscr  v0
li  r4,VRSTATE_VSCR
stvxv0,r4,r7
-   beq 1f
-   PPC_LL  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-   lis r3,(MSR_VEC|MSR_VSX)@h
-FTR_SECTION_ELSE
-   lis r3,MSR_VEC@h
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_VSX)
-#else
-   lis r3,MSR_VEC@h
-#endif
-   andcr4,r4,r3/* disable FP for previous task */
-   PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
 
 #ifdef CONFIG_VSX
-- 
2.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two

2016-02-22 Thread Cyril Bur
This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
marking {fpu,altivec,vsx} as being unused by a thread.

Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
task switching can be made if these two operations are decoupled.
save_all() will permit the saving of registers to thread structs and leave
threads MSR with bits enabled.

This patch introduces no functional change.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/reg.h   |  8 
 arch/powerpc/include/asm/switch_to.h |  7 +++
 arch/powerpc/kernel/process.c| 31 ++-
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c4cb2ff..d07b110 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -75,6 +75,14 @@
 #define MSR_HV 0
 #endif
 
+/*
+ * To be used in shared book E/book S, this avoids needing to worry about
+ * book S/book E in shared code
+ */
+#ifndef MSR_SPE
+#define MSR_SPE0
+#endif
+
 #define MSR_VEC__MASK(MSR_VEC_LG)  /* Enable AltiVec */
 #define MSR_VSX__MASK(MSR_VSX_LG)  /* Enable VSX */
 #define MSR_POW__MASK(MSR_POW_LG)  /* Enable Power 
Management */
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 5b268b6..3690041 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -34,6 +34,7 @@ static inline void disable_kernel_fp(void)
msr_check_and_clear(MSR_FP);
 }
 #else
+static inline void __giveup_fpu(struct task_struct *t) { }
 static inline void flush_fp_to_thread(struct task_struct *t) { }
 #endif
 
@@ -46,6 +47,8 @@ static inline void disable_kernel_altivec(void)
 {
msr_check_and_clear(MSR_VEC);
 }
+#else
+static inline void __giveup_altivec(struct task_struct *t) { }
 #endif
 
 #ifdef CONFIG_VSX
@@ -57,6 +60,8 @@ static inline void disable_kernel_vsx(void)
 {
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
 }
+#else
+static inline void __giveup_vsx(struct task_struct *t) { }
 #endif
 
 #ifdef CONFIG_SPE
@@ -68,6 +73,8 @@ static inline void disable_kernel_spe(void)
 {
msr_check_and_clear(MSR_SPE);
 }
+#else
+static inline void __giveup_spe(struct task_struct *t) { }
 #endif
 
 static inline void clear_task_ebb(struct task_struct *t)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 55c1eb0..29da07f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -444,12 +444,41 @@ void restore_math(struct pt_regs *regs)
regs->msr = msr;
 }
 
+void save_all(struct task_struct *tsk)
+{
+   unsigned long usermsr;
+
+   if (!tsk->thread.regs)
+   return;
+
+   usermsr = tsk->thread.regs->msr;
+
+   if ((usermsr & msr_all_available) == 0)
+   return;
+
+   msr_check_and_set(msr_all_available);
+
+   if (usermsr & MSR_FP)
+   __giveup_fpu(tsk);
+
+   if (usermsr & MSR_VEC)
+   __giveup_altivec(tsk);
+
+   if (usermsr & MSR_VSX)
+   __giveup_vsx(tsk);
+
+   if (usermsr & MSR_SPE)
+   __giveup_spe(tsk);
+
+   msr_check_and_clear(msr_all_available);
+}
+
 void flush_all_to_thread(struct task_struct *tsk)
 {
if (tsk->thread.regs) {
preempt_disable();
BUG_ON(tsk != current);
-   giveup_all(tsk);
+   save_all(tsk);
 
 #ifdef CONFIG_SPE
if (tsk->thread.regs->msr & MSR_SPE)
-- 
2.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 5/9] powerpc: Restore FPU/VEC/VSX if previously used

2016-02-22 Thread Cyril Bur
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a
problem unless a process is using these facilities.

Modern versions of GCC are very good at automatically vectorising code, new
and modernised workloads make use of floating point and vector facilities,
even the kernel makes use of vectorised memcpy.

All this combined greatly increases the cost of a syscall since the kernel
uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.

The obvious overcompensation to this problem is to simply always load all
the facilities on every exit to userspace. Loading up all FPU, VEC and VSX
registers every time can be expensive and if a workload does avoid using
them, it should not be forced to incur this penalty.

An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back to
zero.

Several versions of the assembly in entry_64.S. 1. Always calling C, 2.
Performing a common case check and then calling C and 3. A complex check in
asm. After some benchmarking it was determined that avoiding C in the
common case is a performance benefit. The full check in asm greatly
complicated that codepath for a negligible performance gain and the
trade-off was deemed not worth it.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/processor.h |  2 +
 arch/powerpc/kernel/asm-offsets.c|  2 +
 arch/powerpc/kernel/entry_64.S   | 21 +++--
 arch/powerpc/kernel/fpu.S|  4 ++
 arch/powerpc/kernel/process.c| 88 +++-
 arch/powerpc/kernel/vector.S |  4 ++
 6 files changed, 107 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ac23308..dcab21f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -236,11 +236,13 @@ struct thread_struct {
 #endif
struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */
unsigned long   trap_nr;/* last trap # on this thread */
+   u8 load_fp;
 #ifdef CONFIG_ALTIVEC
struct thread_vr_state vr_state;
struct thread_vr_state *vr_save_area;
unsigned long   vrsave;
int used_vr;/* set if process has used altivec */
+   u8 load_vec;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
/* VSR status */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 07cebc3..10d5eab 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -95,12 +95,14 @@ int main(void)
DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct, fp_state));
DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct, fp_save_area));
DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state, fpscr));
+   DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct, load_fp));
 #ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct, vr_state));
DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct, vr_save_area));
DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state, vscr));
+   DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct, load_vec));
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0d525ce..038e0a1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -210,7 +210,20 @@ system_call:   /* label this so stack 
traces look sane */
li  r11,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
-   cmpld   r3,r11
+
+   andi.   r0,r8,MSR_FP
+   beq 2f
+#ifdef CONFIG_ALTIVEC
+   andis.  r0,r8,MSR_VEC@h
+   bne 3f
+#endif
+2: addir3,r1,STACK_FRAME_OVERHEAD
+   bl  restore_math
+   ld  r8,_MSR(r1)
+   ld  r3,RESULT(r1)
+   li  r11,-MAX_ERRNO
+
+3: cmpld   r3,r11
ld  r5,_CCR(r1)
bge-syscall_error
 .Lsyscall_error_cont:
@@ -602,8 +615,8 @@ _GLOBAL(ret_from_except_lite)
 
/* Check current_thread_info()->flags */
andi.   r0,r4,_TIF_USER_WORK_MASK
-#ifdef CONFIG_PPC_BOOK3E
bne 1f
+#ifdef CONFIG_PPC_BOOK3E
/*
 * Check to see if the dbcr0 register is set up to debug.
 * Use the internal debug mode bit to do this.
@@ -618,7 +631,9 @@ _GLOBAL(ret_from_except_lite)
mtspr   SPRN_DBSR,r10
b   r

[PATCH v5 4/9] powerpc: Explicitly disable math features when copying thread

2016-02-22 Thread Cyril Bur
Currently when threads get scheduled off they always giveup the FPU,
Altivec (VMX) and Vector (VSX) units if they were using them. When they are
scheduled back on a fault is then taken to enable each facility and load
registers. As a result explicitly disabling FPU/VMX/VSX has not been
necessary.

Future changes and optimisations remove this mandatory giveup and fault
which could cause calls such as clone() and fork() to copy threads and run
them later with FPU/VMX/VSX enabled but no registers loaded.

This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
registers must be loaded. This allows for a smarter return to userspace.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/kernel/process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dccc87e..e0c3d2d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
 
f = ret_from_fork;
}
+   childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
sp -= STACK_FRAME_OVERHEAD;
 
/*
-- 
2.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 3/9] selftests/powerpc: Test FPU and VMX regs in signal ucontext

2016-02-22 Thread Cyril Bur
Load up the non volatile FPU and VMX regs and ensure that they are the
expected value in a signal handler

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/math/.gitignore   |   2 +
 tools/testing/selftests/powerpc/math/Makefile |   4 +-
 tools/testing/selftests/powerpc/math/fpu_signal.c | 135 +++
 tools/testing/selftests/powerpc/math/vmx_signal.c | 156 ++
 4 files changed, 296 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c

diff --git a/tools/testing/selftests/powerpc/math/.gitignore 
b/tools/testing/selftests/powerpc/math/.gitignore
index 1a6f09e..4fe13a4 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -2,3 +2,5 @@ fpu_syscall
 vmx_syscall
 fpu_preempt
 vmx_preempt
+fpu_signal
+vmx_signal
diff --git a/tools/testing/selftests/powerpc/math/Makefile 
b/tools/testing/selftests/powerpc/math/Makefile
index 5f5617c..10df9d8 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
+TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt 
vmx_signal
 
 all: $(TEST_PROGS)
 
@@ -9,9 +9,11 @@ $(TEST_PROGS): CFLAGS = $(filter-out -flto,$(CFLAGS) -O2 -g 
-pthread -m64 -malti
 
 fpu_syscall: fpu_asm.S
 fpu_preempt: fpu_asm.S
+fpu_signal:  fpu_asm.S
 
 vmx_syscall: vmx_asm.S
 vmx_preempt: vmx_asm.S
+vmx_signal: vmx_asm.S
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c 
b/tools/testing/selftests/powerpc/math/fpu_signal.c
new file mode 100644
index 000..888aa51
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_signal.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers are correctly reported in a
+ * signal context. Each worker just spins checking its FPU registers, at some
+ * point a signal will interrupt it and C code will check the signal context
+ * ensuring it is also the same.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+/* Number of times each thread should receive the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+2.1};
+
+bool bad_context;
+int threads_starting;
+int running;
+
+extern long preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void signal_fpu_sig(int sig, siginfo_t *info, void *context)
+{
+   int i;
+   ucontext_t *uc = context;
+   mcontext_t *mc = &uc->uc_mcontext;
+
+   /* Only the non volatiles were loaded up */
+   for (i = 14; i < 32; i++) {
+   if (mc->fp_regs[i] != darray[i - 14]) {
+   bad_context = true;
+   break;
+   }
+   }
+}
+
+void *signal_fpu_c(void *p)
+{
+   int i;
+   long rc;
+   struct sigaction act;
+   act.sa_sigaction = signal_fpu_sig;
+   act.sa_flags = SA_SIGINFO;
+   rc = sigaction(SIGUSR1, &act, NULL);
+   if (rc)
+   return p;
+
+   srand(pthread_self());
+   for (i = 0; i < 21; i++)
+   darray[i] = rand();
+
+   rc = preempt_fpu(darray, &threads_starting, &running);
+
+   return (void *) rc;
+}
+
+int test_signal_fpu(void)
+{
+   int i, j, rc, threads;
+   void *rc_p;
+   pthread_t *tids;
+
+   threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+   tids = malloc(threads * sizeof(pthread_t));
+   FAIL_IF(!tids);
+
+   running = true;
+   threads_starting = threads;
+   for (i = 0; i < threads; i++) {
+   rc = pthread_create(&tids[i], NULL, signal_fpu_c, NULL);
+   FAIL_IF(rc);
+   }
+
+   setbuf(stdout, NULL);
+   printf("\tWaiting for all workers to start...");
+   while (threads_starting)
+   asm volatile("": : :"memory");
+   printf("done\n");
+
+   printf("\tSending signals to all threads %d times...", ITERATIONS);
+   for (i = 0; i < ITERATIONS; i++) {
+   for (j = 0; j < threads; j++) {
+   pthread_kill(tids[j], SIGUSR1);
+   }
+   sleep(1);
+   }
+   printf("done\n");
+
+   printf("\tStopping workers..

[PATCH v5 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall

2016-02-22 Thread Cyril Bur
Test that the non volatile floating point and Altivec registers get
correctly preserved across the fork() syscall.

fork() works nicely for this purpose, the registers should be the same for
both parent and child

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/Makefile   |   3 +-
 tools/testing/selftests/powerpc/basic_asm.h|  62 +++
 tools/testing/selftests/powerpc/math/.gitignore|   2 +
 tools/testing/selftests/powerpc/math/Makefile  |  16 ++
 tools/testing/selftests/powerpc/math/fpu_asm.S | 161 +
 tools/testing/selftests/powerpc/math/fpu_syscall.c |  90 ++
 tools/testing/selftests/powerpc/math/vmx_asm.S | 195 +
 tools/testing/selftests/powerpc/math/vmx_syscall.c |  91 ++
 8 files changed, 619 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/math/Makefile
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 0c2706b..19e8191 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -22,7 +22,8 @@ SUB_DIRS = benchmarks \
   switch_endian\
   syscalls \
   tm   \
-  vphn
+  vphn \
+  math
 
 endif
 
diff --git a/tools/testing/selftests/powerpc/basic_asm.h 
b/tools/testing/selftests/powerpc/basic_asm.h
new file mode 100644
index 000..f56482f
--- /dev/null
+++ b/tools/testing/selftests/powerpc/basic_asm.h
@@ -0,0 +1,62 @@
+#include 
+#include 
+
+#if defined(_CALL_ELF) && _CALL_ELF == 2
+#define STACK_FRAME_MIN_SIZE 32
+#define STACK_FRAME_TOC_POS  24
+#define __STACK_FRAME_PARAM(_param)  (32 + ((_param)*8))
+#define __STACK_FRAME_LOCAL(_num_params,_var_num)  
((STACK_FRAME_PARAM(_num_params)) + ((_var_num)*8))
+#else
+#define STACK_FRAME_MIN_SIZE 112
+#define STACK_FRAME_TOC_POS  40
+#define __STACK_FRAME_PARAM(i)  (48 + ((i)*8))
+/*
+ * Caveat: if a function passed more than 8 params, the caller will have
+ * made more space... this should be reflected by this C code.
+ * if (_num_params > 8)
+ * total = 112 + ((_num_params - 8) * 8)
+ *
+ * And substitute the '112' for 'total' in the macro. Doable in preprocessor 
for ASM?
+ */
+#define __STACK_FRAME_LOCAL(_num_params,_var_num)  (112 + ((_var_num)*8))
+#endif
+/* Parameter x saved to the stack */
+#define STACK_FRAME_PARAM(var)__STACK_FRAME_PARAM(var)
+/* Local variable x saved to the stack after x parameters */
+#define STACK_FRAME_LOCAL(num_params,var)
__STACK_FRAME_LOCAL(num_params,var)
+#define STACK_FRAME_LR_POS   16
+#define STACK_FRAME_CR_POS   8
+
+#define LOAD_REG_IMMEDIATE(reg,expr) \
+   lis reg,(expr)@highest; \
+   ori reg,reg,(expr)@higher;  \
+   rldicr  reg,reg,32,31;  \
+   orisreg,reg,(expr)@high;\
+   ori reg,reg,(expr)@l;
+
+/* It is very important to note here that _extra is the extra amount of
+ * stack space needed.
+ * This space must be accessed using STACK_FRAME_PARAM() or
+ * STACK_FRAME_LOCAL() macros!
+ *
+ * r1 and r2 are not defined in ppc-asm.h (instead they are defined as sp
+ * and toc). Kernel programmers tend to prefer rX even for r1 and r2, hence
+ * %1 and %r2. r0 is defined in ppc-asm.h and therefore %r0 gets
+ * preprocessed incorrectly, hence r0.
+ */
+#define PUSH_BASIC_STACK(_extra) \
+   mflrr0; \
+   std r0,STACK_FRAME_LR_POS(%r1); \
+   stdu%r1,-(_extra + STACK_FRAME_MIN_SIZE)(%r1); \
+   mfcrr0; \
+   stw r0,STACK_FRAME_CR_POS(%r1); \
+   std %r2,STACK_FRAME_TOC_POS(%r1);
+
+#define POP_BASIC_STACK(_extra) \
+   ld  %r2,STACK_FRAME_TOC_POS(%r1); \
+   lwz r0,STACK_FRAME_CR_POS(%r1); \
+   mtcrr0; \
+   addi%r1,%r1,(_extra + STACK_FRAME_MIN_SIZE); \
+   ld  r0,STACK_FRAME_LR_POS(%r1); \
+   mtlrr0;
+
diff --git a/tools/testing/selftests/powerpc/math/.gitignore 
b/tools/testing/selftests/powerpc/math/.gitignore
new file mode 100644
index 000..b19b269
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -0,0 +1,2 @@
+fpu_syscall
+vmx_syscall
diff --git a/tools/testing/selftests/powerpc/math/Makefile 
b/tools/testing/selftests/powerpc/math/Makefile
new file mode 100644
index 000..598e5df
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -0,0 +1,16 @@
+TEST_PROGS := fpu_syscall vmx_syscall
+
+all: $(TEST_PROGS)
+
+#The general powerpc makefile adds -flto. This isn't interacting well 

[PATCH v5 0/9] FP/VEC/VSX switching optimisations

2016-02-22 Thread Cyril Bur
Cover-letter for V1 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136350.html

Cover-letter for V2 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-January/138054.html

Changes in V3:
Addressed review comments from Michael Neuling
 - Made commit message in 4/9 better reflect the patch
 - Removed overuse of #ifdef blocks and redundant condition in 5/9
 - Split 6/8 in two to better prepare for 7,8,9
 - Removed #ifdefs in 6/9

Changes in V4:
 - Addressed non ABI compliant ASM macros in 1/9
 - Fixed build breakage due to changing #ifdefs in V3 (6/9)
 - Reordered some conditions in if statements

Changes in V5:
 - Enchanced basic-asm.h to provide ABI independent macro as pointed out by
   Naveen Rao.
   - Tested for both BE and LE builds. Had to disable -flto from the
 selftests/powerpc Makefile as it didn't play well with the custom ASM.
 - Added some extra debugging output to the vmx_signal testcase
 - Fixed comments in testing code
 - Updated VSX test code to use GCC Altivec macros



Cyril Bur (9):
  selftests/powerpc: Test the preservation of FPU and VMX regs across
syscall
  selftests/powerpc: Test preservation of FPU and VMX regs across
preemption
  selftests/powerpc: Test FPU and VMX regs in signal ucontext
  powerpc: Explicitly disable math features when copying thread
  powerpc: Restore FPU/VEC/VSX if previously used
  powerpc: Prepare for splitting giveup_{fpu,altivec,vsx} in two
  powerpc: Add the ability to save FPU without giving it up
  powerpc: Add the ability to save Altivec without giving it up
  powerpc: Add the ability to save VSX without giving it up

 arch/powerpc/include/asm/processor.h   |   2 +
 arch/powerpc/include/asm/reg.h |   8 +
 arch/powerpc/include/asm/switch_to.h   |  13 +-
 arch/powerpc/kernel/asm-offsets.c  |   2 +
 arch/powerpc/kernel/entry_64.S |  21 +-
 arch/powerpc/kernel/fpu.S  |  25 +--
 arch/powerpc/kernel/ppc_ksyms.c|   4 -
 arch/powerpc/kernel/process.c  | 168 +--
 arch/powerpc/kernel/vector.S   |  45 +---
 tools/testing/selftests/powerpc/Makefile   |   3 +-
 tools/testing/selftests/powerpc/basic_asm.h|  62 ++
 tools/testing/selftests/powerpc/math/.gitignore|   6 +
 tools/testing/selftests/powerpc/math/Makefile  |  21 ++
 tools/testing/selftests/powerpc/math/fpu_asm.S | 197 +
 tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 ++
 tools/testing/selftests/powerpc/math/fpu_signal.c  | 135 
 tools/testing/selftests/powerpc/math/fpu_syscall.c |  90 
 tools/testing/selftests/powerpc/math/vmx_asm.S | 234 +
 tools/testing/selftests/powerpc/math/vmx_preempt.c | 112 ++
 tools/testing/selftests/powerpc/math/vmx_signal.c  | 156 ++
 tools/testing/selftests/powerpc/math/vmx_syscall.c |  91 
 21 files changed, 1425 insertions(+), 83 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/math/Makefile
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c

-- 
2.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5] powerpc32: provide VIRT_CPU_ACCOUNTING

2016-02-22 Thread Michael Ellerman
On Mon, 2016-02-22 at 20:15 -0600, Scott Wood wrote:
> On Tue, 2016-02-23 at 13:04 +1100, Michael Ellerman wrote:
> > On Tue, 2016-02-16 at 15:21 -0600, Scott Wood wrote:
> > > On Thu, 2016-02-11 at 17:16 +0100, Christophe Leroy wrote:
> > > > This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> > > > PPC32 doesn't have the PACA structure, so we use the task_info
> > > > structure to store the accounting data.
> > > > 
> > > > In order to reuse on PPC32 the PPC64 functions, all u64 data has
> > > > been replaced by 'unsigned long' so that it is u32 on PPC32 and
> > > > u64 on PPC64
> > > > 
> > > > Signed-off-by: Christophe Leroy 
> > > > ---
> > > > Changes in v3: unlike previous version of the patch that was inspired
> > > > from IA64 architecture, this new version tries to reuse as much as
> > > > possible the PPC64 implementation.
> > > > 
> > > > PPC32 doesn't have PACA and past discusion on v2 version has shown
> > > > that it is not worth implementing a PACA in PPC32 architecture
> > > > (see below benh opinion)
> > > > 
> > > > benh: PACA is actually a data structure and you really really don't want
> > > > it
> > > > on ppc32 :-) Having a register point to current works, having a register
> > > > point to per-cpu data instead works too (ie, change what we do today),
> > > > but don't introduce a PACA *please* :-)
> > > 
> > > And Ben never replied to my reply at the time:
> > > 
> > > "What is special about 64-bit that warrants doing things differently from
> > > 32
> > > -bit?
> > 
> > Nothing. It's just historical cruft. But we're not realistically going to
> > get
> > rid of it anytime soon on 64-bit.
> 
> I wasn't suggesting getting rid of it on 64-bit, but rather adding it on 32
> -bit, to hold things that are used by both.  I was confused by the vehemence
> of Ben's objection.

OK right. I think he's just saying we'd like to (eventually) get rid of it on
64-bit, so adding it on 32-bit would be a step backward.

> > > What is the difference between PACA and "per-cpu data", other than the
> > > obscure name?"
> > 
> > Not much. The pacas are allocated differently to per-cpu data, they're
> > available earlier in boot etc.
> 
> Ah, I was thinking of the general concept of per-cpu data, not the specific
> mechanism that Linux implements in percpu.h etc.

Oh ok, in that case no it's not special at all.

> >  What we'd like is to have r13 point to the
> > per-cpu data area, and then the contents of the paca could just be regular
> > per-cpu data. But like I said above that's a big change.
> 
> That change seems orthogonal to the question of making the mechanism available
> on 32-bit to ease unification of code which uses it.

That's true.

Though in this case I think you actually do want to store those values in the 
thread_info.
If you look at eg. vtime_delta() where we use those values, it's passed a task
struct.

So your suggestion to define a common struct that is shared between the 32-bit
thread_info and the 64-bit paca would be a good solution I think.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/9] powerpc/mm/book3s-64: Use physical addresses in upper page table tree levels

2016-02-22 Thread Paul Mackerras
This changes the Linux page tables to store physical addresses
rather than kernel virtual addresses in the upper levels of the
tree (pgd, pud and pmd) for 64-bit Book 3S machines.

This also changes the hugepd pointers used to implement hugepages
when the base page size is 4k to store physical addresses rather than
virtual addresses (again just for 64-bit Book3S machines).

This frees up some high order bits, and will be needed with
PowerISA v3.0 machines which read the page table tree in hardware
in radix mode.

Signed-off-by: Paul Mackerras 
---
v2: Also convert the hugepd pointers, which fixes a kernel crash when
using huge pages under a 4k-page kernel.

 arch/powerpc/include/asm/book3s/64/hash-4k.h |  2 +-
 arch/powerpc/include/asm/book3s/64/hash.h| 13 +++--
 arch/powerpc/include/asm/hugetlb.h   |  2 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h |  3 +++
 arch/powerpc/include/asm/page.h  |  7 +++
 arch/powerpc/include/asm/pgalloc-64.h| 16 
 arch/powerpc/mm/hugetlbpage.c|  3 +--
 7 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index bee3643..0425d3e 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -64,7 +64,7 @@
 #define pgd_none(pgd)  (!pgd_val(pgd))
 #define pgd_bad(pgd)   (pgd_val(pgd) == 0)
 #define pgd_present(pgd)   (pgd_val(pgd) != 0)
-#define pgd_page_vaddr(pgd)(pgd_val(pgd) & ~PGD_MASKED_BITS)
+#define pgd_page_vaddr(pgd)__va(pgd_val(pgd) & ~PGD_MASKED_BITS)
 
 static inline void pgd_clear(pgd_t *pgdp)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 64eff40..5b8ba60 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -222,13 +222,14 @@
 #define PUD_BAD_BITS   (PMD_TABLE_SIZE-1)
 
 #ifndef __ASSEMBLY__
-#definepmd_bad(pmd)(!is_kernel_addr(pmd_val(pmd)) \
-|| (pmd_val(pmd) & PMD_BAD_BITS))
-#define pmd_page_vaddr(pmd)(pmd_val(pmd) & ~PMD_MASKED_BITS)
+#definepmd_bad(pmd)(pmd_val(pmd) & PMD_BAD_BITS)
+#define pmd_page_vaddr(pmd)__va(pmd_val(pmd) & ~PMD_MASKED_BITS)
 
-#definepud_bad(pud)(!is_kernel_addr(pud_val(pud)) \
-|| (pud_val(pud) & PUD_BAD_BITS))
-#define pud_page_vaddr(pud)(pud_val(pud) & ~PUD_MASKED_BITS)
+#definepud_bad(pud)(pud_val(pud) & PUD_BAD_BITS)
+#define pud_page_vaddr(pud)__va(pud_val(pud) & ~PUD_MASKED_BITS)
+
+/* Pointers in the page table tree are physical addresses */
+#define __pgtable_ptr_val(ptr) __pa(ptr)
 
 #define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_PGD - 1))
 #define pmd_index(address) (((address) >> (PMD_SHIFT)) & (PTRS_PER_PMD - 1))
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 7eac89b..42814f0 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -19,7 +19,7 @@ static inline pte_t *hugepd_page(hugepd_t hpd)
 * We have only four bits to encode, MMU page size
 */
BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf);
-   return (pte_t *)(hpd.pd & ~HUGEPD_SHIFT_MASK);
+   return __va(hpd.pd & HUGEPD_ADDR_MASK);
 }
 
 static inline unsigned int hugepd_mmu_psize(hugepd_t hpd)
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index b9f734d..10debb9 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -108,6 +108,9 @@
 #ifndef __ASSEMBLY__
 /* pte_clear moved to later in this file */
 
+/* Pointers in the page table tree are virtual addresses */
+#define __pgtable_ptr_val(ptr) ((unsigned long)(ptr))
+
 #define PMD_BAD_BITS   (PTE_TABLE_SIZE-1)
 #define PUD_BAD_BITS   (PMD_TABLE_SIZE-1)
 
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index e34124f..af7a342 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -271,6 +271,13 @@ extern long long virt_phys_offset;
 #else
 #define PD_HUGE 0x8000
 #endif
+
+#else  /* CONFIG_PPC_BOOK3S_64 */
+/*
+ * Book3S 64 stores real addresses in the hugepd entries to
+ * avoid overlaps with _PAGE_PRESENT and _PAGE_PTE.
+ */
+#define HUGEPD_ADDR_MASK   (0x0ffful & ~HUGEPD_SHIFT_MASK)
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 /*
diff --git a/arch/powerpc/include/asm/pgalloc-64.h 
b/arch/powerpc/include/asm/pgalloc-64.h
index 69ef28a..7ac59a3 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -53,7 +53,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 
 #ifndef CONFIG_PPC_64K_PAGES
 
-#define p

Re: [PATCH V2 00/29] Book3s abstraction in preparation for new MMU model

2016-02-22 Thread Aneesh Kumar K.V
Scott Wood  writes:

> On Tue, 2016-02-09 at 18:52 +0530, Aneesh Kumar K.V wrote:
>> 
>> Hi Scott,
>> 
>> I missed adding you on CC:, Can you take a look at this and make sure we
>> are not breaking anything on freescale.
>
> I'm having trouble getting it to apply cleanly.  Do you have a git tree I can
> test?
>


https://github.com/kvaneesh/linux/commits/radix-mmu-v2

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5] powerpc32: provide VIRT_CPU_ACCOUNTING

2016-02-22 Thread Scott Wood
On Tue, 2016-02-23 at 13:04 +1100, Michael Ellerman wrote:
> On Tue, 2016-02-16 at 15:21 -0600, Scott Wood wrote:
> 
> > On Thu, 2016-02-11 at 17:16 +0100, Christophe Leroy wrote:
> 
> > > This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> > > PPC32 doesn't have the PACA structure, so we use the task_info
> > > structure to store the accounting data.
> > > 
> > > In order to reuse on PPC32 the PPC64 functions, all u64 data has
> > > been replaced by 'unsigned long' so that it is u32 on PPC32 and
> > > u64 on PPC64
> > > 
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > > Changes in v3: unlike previous version of the patch that was inspired
> > > from IA64 architecture, this new version tries to reuse as much as
> > > possible the PPC64 implementation.
> > > 
> > > PPC32 doesn't have PACA and past discusion on v2 version has shown
> > > that it is not worth implementing a PACA in PPC32 architecture
> > > (see below benh opinion)
> > > 
> > > benh: PACA is actually a data structure and you really really don't want
> > > it
> > > on ppc32 :-) Having a register point to current works, having a register
> > > point to per-cpu data instead works too (ie, change what we do today),
> > > but don't introduce a PACA *please* :-)
> > 
> > And Ben never replied to my reply at the time:
> > 
> > "What is special about 64-bit that warrants doing things differently from
> > 32
> > -bit?
> 
> Nothing. It's just historical cruft. But we're not realistically going to
> get
> rid of it anytime soon on 64-bit.

I wasn't suggesting getting rid of it on 64-bit, but rather adding it on 32
-bit, to hold things that are used by both.  I was confused by the vehemence
of Ben's objection.

> > What is the difference between PACA and "per-cpu data", other than the
> > obscure name?"
> 
> Not much. The pacas are allocated differently to per-cpu data, they're
> available earlier in boot etc.

Ah, I was thinking of the general concept of per-cpu data, not the specific
mechanism that Linux implements in percpu.h etc.

>  What we'd like is to have r13 point to the
> per-cpu data area, and then the contents of the paca could just be regular
> per-cpu data. But like I said above that's a big change.

That change seems orthogonal to the question of making the mechanism available
on 32-bit to ease unification of code which uses it.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/1] powerpc: Detect broken or mismatched toolchains

2016-02-22 Thread Scott Wood
On Mon, 2016-02-22 at 16:13 +1100, Sam Bobroff wrote:
> It can currently be difficult to diagnose a build that fails due to
> the compiler, linker or other parts of the toolchain being unable to
> build binaries of the type required by the kernel config. For example
> using a little endian toolchain to build a big endian kernel may
> produce:
> 
> as: unrecognized option '-maltivec'
> 
> This patch adds a basic compile test and error message to
> arch/powerpc/Makefile so that the above error becomes:
> 
> *** Sorry, your toolchain seems to be broken or incorrect. ***
> Make sure it supports your kernel configuration (ppc64).
> 
> Signed-off-by: Sam Bobroff 
> ---

How is this more useful than getting to actually see the way in which the
toolchain (or the CFLAGS) is broken?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5] powerpc32: provide VIRT_CPU_ACCOUNTING

2016-02-22 Thread Michael Ellerman
On Tue, 2016-02-16 at 15:21 -0600, Scott Wood wrote:

> On Thu, 2016-02-11 at 17:16 +0100, Christophe Leroy wrote:

> > This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> > PPC32 doesn't have the PACA structure, so we use the task_info
> > structure to store the accounting data.
> > 
> > In order to reuse on PPC32 the PPC64 functions, all u64 data has
> > been replaced by 'unsigned long' so that it is u32 on PPC32 and
> > u64 on PPC64
> > 
> > Signed-off-by: Christophe Leroy 
> > ---
> > Changes in v3: unlike previous version of the patch that was inspired
> > from IA64 architecture, this new version tries to reuse as much as
> > possible the PPC64 implementation.
> > 
> > PPC32 doesn't have PACA and past discusion on v2 version has shown
> > that it is not worth implementing a PACA in PPC32 architecture
> > (see below benh opinion)
> > 
> > benh: PACA is actually a data structure and you really really don't want it
> > on ppc32 :-) Having a register point to current works, having a register
> > point to per-cpu data instead works too (ie, change what we do today),
> > but don't introduce a PACA *please* :-)
> 
> And Ben never replied to my reply at the time:
> 
> "What is special about 64-bit that warrants doing things differently from 32
> -bit?

Nothing. It's just historical cruft. But we're not realistically going to get
rid of it anytime soon on 64-bit.

> What is the difference between PACA and "per-cpu data", other than the
> obscure name?"

Not much. The pacas are allocated differently to per-cpu data, they're
available earlier in boot etc. What we'd like is to have r13 point to the
per-cpu data area, and then the contents of the paca could just be regular
per-cpu data. But like I said above that's a big change.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 00/29] Book3s abstraction in preparation for new MMU model

2016-02-22 Thread Scott Wood
On Tue, 2016-02-09 at 18:52 +0530, Aneesh Kumar K.V wrote:
> 
> Hi Scott,
> 
> I missed adding you on CC:, Can you take a look at this and make sure we
> are not breaking anything on freescale.

I'm having trouble getting it to apply cleanly.  Do you have a git tree I can
test?

-Scott

> "Aneesh Kumar K.V"  writes:
> 
> > Hello,
> > 
> > This is a large series, mostly consisting of code movement. No new
> > features
> > are done in this series. The changes are done to accomodate the upcoming
> > new memory
> > model in future powerpc chips. The details of the new MMU model can be
> > found at
> > 
> >  http://ibm.biz/power-isa3 (Needs registration). I am including a summary
> > of the changes below.
> > 
> > ISA 3.0 adds support for the radix tree style of MMU with full
> > virtualization and related control mechanisms that manage its
> > coexistence with the HPT. Radix-using operating systems will
> > manage their own translation tables instead of relying on hcalls.
> > 
> > Radix style MMU model requires us to do a 4 level page table
> > with 64K and 4K page size. The table index size different page size
> > is listed below
> > 
> > PGD -> 13 bits
> > PUD -> 9 (1G hugepage)
> > PMD -> 9 (2M huge page)
> > PTE -> 5 (for 64k), 9 (for 4k)
> > 
> > We also require the page table to be in big endian format.
> > 
> > The changes proposed in this series enables us to support both
> > hash page table and radix tree style MMU using a single kernel
> > with limited impact. The idea is to change core page table
> > accessors to static inline functions and later hotpatch them
> > to switch to hash or radix tree functions. For ex:
> > 
> > static inline int pte_write(pte_t pte)
> > {
> >if (radix_enabled())
> >return rpte_write(pte);
> > return hlpte_write(pte);
> > }
> > 
> > On boot we will hotpatch the code so as to avoid conditional operation.
> > 
> > The other two major change propsed in this series is to switch hash
> > linux page table to a 4 level table in big endian format. This is
> > done so that functions like pte_val(), pud_populate() doesn't need
> > hotpatching and thereby helps in limiting runtime impact of the changes.
> > 
> > I didn't included the radix related changes in this series. You can
> > find them at https://github.com/kvaneesh/linux/commits/radix-mmu-v1
> > 
> > Changes from V1:
> > * move patches adding helpers to the next series
> > 
> 
> 
> Thanks
> -aneesh
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5] powerpc32: provide VIRT_CPU_ACCOUNTING

2016-02-22 Thread Scott Wood
On Wed, 2016-02-17 at 17:29 +0100, Christophe Leroy wrote:
> 
> Le 16/02/2016 22:21, Scott Wood a écrit :
> > On Thu, 2016-02-11 at 17:16 +0100, Christophe Leroy wrote:
> > > This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> > > PPC32 doesn't have the PACA structure, so we use the task_info
> > > structure to store the accounting data.
> > > 
> > > In order to reuse on PPC32 the PPC64 functions, all u64 data has
> > > been replaced by 'unsigned long' so that it is u32 on PPC32 and
> > > u64 on PPC64
> > > 
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > > Changes in v3: unlike previous version of the patch that was inspired
> > > from IA64 architecture, this new version tries to reuse as much as
> > > possible the PPC64 implementation.
> > > 
> > > PPC32 doesn't have PACA and past discusion on v2 version has shown
> > > that it is not worth implementing a PACA in PPC32 architecture
> > > (see below benh opinion)
> > > 
> > > benh: PACA is actually a data structure and you really really don't want
> > > it
> > > on ppc32 :-) Having a register point to current works, having a register
> > > point to per-cpu data instead works too (ie, change what we do today),
> > > but don't introduce a PACA *please* :-)
> > And Ben never replied to my reply at the time:
> > 
> > "What is special about 64-bit that warrants doing things differently from
> > 32
> > -bit?  What is the difference between PACA and "per-cpu data", other than
> > the
> > obscure name?"
> > 
> > I can understand wanting to avoid churn, but other than that, doing things
> > differently on 64-bit versus 32-bit sucks.
> > 
> 
> What I can see is that PACA is always available via register r13. Do we 
> have anything equivalent on PPC32 ?

Just current in r2, which is the task_struct, not a task-independent per-cpu
area.

> If we define a per-cpu data for accounting, what will be the quick way 
> to get access to it in entry_32.S ?
> Something like a table of accounting data for each CPU, that we index 
> with thread_info->cpu ?
> This would allow a quite quick access, is it the good way to proceed in 
> order to have something closer to PPC64 ?

Possibly.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/9] powerpc/mm/book3s-64: Use physical addresses in upper page table tree levels

2016-02-22 Thread Paul Mackerras
On Mon, Feb 22, 2016 at 10:25:51AM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras  writes:
> 
> > From: Paul Mackerras 
> >
> > This changes the Linux page tables to store physical addresses
> > rather than kernel virtual addresses in the upper levels of the
> > tree (pgd, pud and pmd) for 64-bit Book 3S machines.
> >
> > This frees up some high order bits, and will be needed with
> > PowerISA v3.0 machines which read the page table tree in hardware
> > in radix mode.
> 
> How about hugepd pointer with 4k linux page size ?

Yes, I need to fix them too.  New series coming...

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/9] powerpc/mm/book3s-64: Use physical addresses in upper page table tree levels

2016-02-22 Thread Paul Mackerras
On Mon, Feb 22, 2016 at 12:36:03PM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras  writes:
> 
> > From: Paul Mackerras 
> >
> > This changes the Linux page tables to store physical addresses
> > rather than kernel virtual addresses in the upper levels of the
> > tree (pgd, pud and pmd) for 64-bit Book 3S machines.
> >
> > This frees up some high order bits, and will be needed with
> > PowerISA v3.0 machines which read the page table tree in hardware
> > in radix mode.
> >
> 
> Radix mark the top two bits at upper level page table tree.
> 
> ie,
> 
> 
> static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> {
>   pud_set(pud, __pgtable_ptr_val(pmd));
> }
> 
> static inline void rpud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> {
>   *pud = __pud(__pa(pmd) | RPUD_VAL_BITS);
> }
> 
> 
> I guess we will do the same with hash to keep them same ?

Yes, that makes sense.  I assume RPUD_VAL_BITS above is basically
_PAGE_PRESENT plus the next-level-size field at the bottom.  Setting
that next-level-size field will make it look like a hugepd pointer, so
we'll need some other way to distinguish them (maybe the _PAGE_PRESENT
bit?).

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/pagetable: Add option to dump kernel pagetable

2016-02-22 Thread Rashmica

Hi Anshuman,

Thanks for the feedback!

On 22/02/16 21:13, Anshuman Khandual wrote:

On 02/22/2016 11:32 AM, Rashmica Gupta wrote:

Useful to be able to dump the kernel page tables to check permissions and
memory types - derived from arm64's implementation.

Add a debugfs file to check the page tables. To use this the PPC_PTDUMP
config option must be selected.

Tested on 64BE and 64LE with both 4K and 64K page sizes.
---

This statement above must be after the  line else it will be part of
the commit message or you wanted the test note as part of commit message
itself ?

The patch seems to contain some white space problems. Please clean them up.

Will do!

  arch/powerpc/Kconfig.debug |  12 ++
  arch/powerpc/mm/Makefile   |   1 +
  arch/powerpc/mm/dump.c | 364 +
  3 files changed, 377 insertions(+)
  create mode 100644 arch/powerpc/mm/dump.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 638f9ce740f5..e4883880abe3 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -344,4 +344,16 @@ config FAIL_IOMMU
  
  	  If you are unsure, say N.
  
+config PPC_PTDUMP

+bool "Export kernel pagetable layout to userspace via debugfs"
+depends on DEBUG_KERNEL
+select DEBUG_FS
+help
+  This options dumps the state of the kernel pagetables in a debugfs
+  file. This is only useful for kernel developers who are working in
+  architecture specific areas of the kernel - probably not a good idea 
to
+  enable this feature in a production kernel.

Just clean the paragraph alignment here 
..


+
+  If you are unsure, say N.
+
  endmenu
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 1ffeda85c086..16f84bdd7597 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -40,3 +40,4 @@ obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
  obj-$(CONFIG_HIGHMEM) += highmem.o
  obj-$(CONFIG_PPC_COPRO_BASE)  += copro_fault.o
  obj-$(CONFIG_SPAPR_TCE_IOMMU) += mmu_context_iommu.o
+obj-$(CONFIG_PPC_PTDUMP)   += dump.o

File name like "[kernel_]pgtable_dump.c" will sound better ? Or
just use like the X86 one "dump_pagetables.c". "dump.c" sounds
very generic.

Yup, good point.

diff --git a/arch/powerpc/mm/dump.c b/arch/powerpc/mm/dump.c
new file mode 100644
index ..937b10fc40cc
--- /dev/null
+++ b/arch/powerpc/mm/dump.c
@@ -0,0 +1,364 @@
+/*
+ * Copyright 2016, Rashmica Gupta, IBM Corp.
+ *
+ * Debug helper to dump the current kernel pagetables of the system
+ * so that we can see what the various memory ranges are set to.
+ *
+ * Derived from the arm64 implementation:
+ * Copyright (c) 2014, The Linux Foundation, Laura Abbott.
+ * (C) Copyright 2008 Intel Corporation, Arjan van de Ven.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PUD_TYPE_MASK   (_AT(u64, 3) << 0)
+#define PUD_TYPE_SECT   (_AT(u64, 1) << 0)
+#define PMD_TYPE_MASK   (_AT(u64, 3) << 0)
+#define PMD_TYPE_SECT   (_AT(u64, 1) << 0)
+
+
+#if CONFIG_PGTABLE_LEVELS == 2
+#include 
+#elif CONFIG_PGTABLE_LEVELS == 3
+#include 
+#endif

Really ? Do we have any platform with only 2 level of page table ?
  

I'm not sure - was trying to cover all the bases. If you're
confident that we don't, I can remove it.

+
+#define pmd_sect(pmd)  ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
+#ifdef CONFIG_PPC_64K_PAGES
+#define pud_sect(pud)   (0)
+#else
+#define pud_sect(pud)   ((pud_val(pud) & PUD_TYPE_MASK) == \
+   PUD_TYPE_SECT)
+#endif

Can you please explain the use of pmd_sect() and pud_sect() defines ?


+   
+
+struct addr_marker {
+   unsigned long start_address;
+   const char *name;
+};

All the architectures are using the same structure addr_marker.
Cannot we just move it to a generic header file ? There are
other such common structures like these in the file which are
used across architectures and can be moved to some where common ?

Could do that. Where do you think would be the appropriate place
for such a header file?

+
+enum address_markers_idx {
+   VMALLOC_START_NR = 0,
+   VMALLOC_END_NR,
+   ISA_IO_START_NR,
+   ISA_IO_END_NR,
+   PHB_IO_START_NR,
+   PHB_IO_END_NR,
+   IOREMAP_START_NR,
+   IOREMP_END_NR,
+};

Where these are used ? ^ I dont see any where.

Whoops, yes those are not used anymore.


Also as it dumps only the kernel virtual mapping, should not we
mention about it some where.

See my question below...

+
+static struct addr_marker address_marker

Re: Fwd: [PATCH v4 10/18] cxl: New hcalls to support CAPI adapters

2016-02-22 Thread Manoj Kumar

On 2/22/2016 12:14 PM, Frederic Barrat wrote:

platoform->platform

Irreverent to the Socratic amongst us.



Hope we didn't hurt your feelings :-D

   Fred


No, you did not!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fwd: [PATCH v4 08/18] cxl: IRQ allocation for guests

2016-02-22 Thread Manoj Kumar

On 2/22/2016 8:46 AM, Frederic Barrat wrote:

Le 21/02/2016 23:30, Manoj Kumar a écrit :

Subject: [PATCH v4 08/18] cxl: IRQ allocation for guests
Date: Tue, 16 Feb 2016 22:39:01 +0100
From: Frederic Barrat 
To: imun...@au1.ibm.com, michael.neul...@au1.ibm.com,
m...@ellerman.id.au, linuxppc-dev@lists.ozlabs.org

The PSL interrupt is not going to be multiplexed in a guest, so an
interrupt will be allocated for it for each context.


Not clear why this is the case. Why cannot the CXL later still
multiplex this in a guest? Is this a design choice, an
architectural issue, or the complexity of implementation did
not warrant this? From an API perspective it would have been
preferable to not cascade this change down to all consumers,
and have consumers aware whether they are working in a
bare-metal or a guest environment.



It was a design choice made by pHyp. We cannot multiplex the PSL
interrupt with the current pHyp implementation.


If that is the case, perhaps the commit message should be re-worded.
As currently written, it seems like it was a choice made by
this patch.

The PSL interrupt cannot be multiplexed in a guest, because blah...

--
Manoj Kumar

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fwd: [PATCH v4 11/18] cxl: Separate bare-metal fields in adapter and AFU data structures

2016-02-22 Thread Manoj Kumar

On 2/22/2016 11:57 AM, Frederic Barrat wrote:

Manoj,

Point taken. Those constants are all defined in the architecture
document (CAIA). We should probably use more macros there.
However, since those were not introduced by this patch, I'll put it in
my todo list for the future, but don't intend to address it in this
patchset.

   Fred


Fred:

I am fine with this approach.

--
Manoj Kumar

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fwd: [PATCH v4 10/18] cxl: New hcalls to support CAPI adapters

2016-02-22 Thread Frederic Barrat



+
+/**
+ * cxl_h_validate_adapter_image - Validate the base image in the
coherent
+ *platoform facility.


platoform->platform

Irreverent to the Socratic amongst us.



Hope we didn't hurt your feelings :-D

  Fred

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fwd: [PATCH v4 02/18] cxl: Move bare-metal specific code to specialized files

2016-02-22 Thread Frederic Barrat



Le 21/02/2016 22:44, Manoj Kumar a écrit :

Code specific to bare-metal is meant to be in native.c or pci.c
only. It's basically anything which touches the capi p1 registers,


I thought we were going to avoid using the CAPI term externally.
Please update if submitting a v4 of this patch series.


True, cxl should be preferred. I've renamed most of them for v5, just 
letting a couple in patch 0 since it doesn't stay and when I believed 
the context required it.


  Fred

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fwd: [PATCH v4 12/18] cxl: Add guest-specific code

2016-02-22 Thread Frederic Barrat

Manoj,

cxl hasn't been and is not checkpatch-clean. That being said, we tried 
to not make it worse. I've let go 2 types of reports, which were already 
present in the cxl code:
- lines longer than 80 characters, when it's not showing a clear sign 
that code should be refactored

- assignment in if condition

I've fixed a couple of CodingStyle issues which were introduced in v4 of 
the patchset.



  Fred


Le 22/02/2016 02:29, Manoj Kumar a écrit :

Christophe, Fred:

Is getting the code checkpatch clean not a requirement for
this component?

total: 458 errors, 995 warnings, 1602 lines checked

NOTE: Whitespace errors detected.
   You may wish to use scripts/cleanpatch or scripts/cleanfile


I am stopping my review at this point.
Will pick it back up after you resubmit.

--
Manoj Kumar


Subject: [PATCH v4 12/18] cxl: Add guest-specific code
Date: Tue, 16 Feb 2016 22:39:05 +0100
From: Frederic Barrat 
To: imun...@au1.ibm.com, michael.neul...@au1.ibm.com,
m...@ellerman.id.au, linuxppc-dev@lists.ozlabs.org

From: Christophe Lombard 

The new of.c file contains code to parse the device tree to find out
about CAPI adapters and AFUs.

guest.c implements the guest-specific callbacks for the backend API.

The process element ID is not known until the context is attached, so
we have to separate the context ID assigned by the cxl driver from the
process element ID visible to the user applications. In bare-metal,
the 2 IDs match.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
  drivers/misc/cxl/Makefile  |   1 +
  drivers/misc/cxl/api.c |   2 +-
  drivers/misc/cxl/context.c |   6 +-
  drivers/misc/cxl/cxl.h |  37 +-
  drivers/misc/cxl/file.c|   2 +-
  drivers/misc/cxl/guest.c   | 950
+
  drivers/misc/cxl/main.c|  18 +-
  drivers/misc/cxl/of.c  | 513 
  8 files changed, 1519 insertions(+), 10 deletions(-)
  create mode 100644 drivers/misc/cxl/guest.c
  create mode 100644 drivers/misc/cxl/of.c

diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index be2ac5c..a3d4bef 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -4,6 +4,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
  cxl-y+= main.o file.o irq.o fault.o native.o
  cxl-y+= context.o sysfs.o debugfs.o pci.o trace.o
  cxl-y+= vphb.o api.o
+cxl-y+= guest.o of.o hcalls.o
  obj-$(CONFIG_CXL)+= cxl.o
  obj-$(CONFIG_CXL_BASE)+= base.o

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 31eb842..325f957 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(cxl_start_context);

  int cxl_process_element(struct cxl_context *ctx)
  {
-return ctx->pe;
+return ctx->external_pe;
  }
  EXPORT_SYMBOL_GPL(cxl_process_element);

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 200837f..180c85a 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -95,8 +95,12 @@ int cxl_context_init(struct cxl_context *ctx, struct
cxl_afu *afu, bool master,
  return i;

  ctx->pe = i;
-if (cpu_has_feature(CPU_FTR_HVMODE))
+if (cpu_has_feature(CPU_FTR_HVMODE)) {
  ctx->elem = &ctx->afu->native->spa[i];
+ctx->external_pe = ctx->pe;
+} else {
+ctx->external_pe = -1; /* assigned when attaching */
+}
  ctx->pe_inserted = false;

  /*
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 3a1fabd..4372a87 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -433,6 +433,12 @@ struct cxl_irq_name {
  char *name;
  };

+struct irq_avail {
+irq_hw_number_t offset;
+irq_hw_number_t range;
+unsigned long   *bitmap;
+};
+
  /*
   * This is a cxl context.  If the PSL is in dedicated mode, there will
be one
   * of these per AFU.  If in AFU directed there can be lots of these.
@@ -488,7 +494,19 @@ struct cxl_context {

  struct cxl_process_element *elem;

-int pe; /* process element handle */
+/*
+ * pe is the process element handle, assigned by this driver when
the
+ * context is initialized.
+ *
+ * external_pe is the PE shown outside of cxl.
+ * On bare-metal, pe=external_pe, because we decide what the handle
is.
+ * In a guest, we only find out about the pe used by pHyp when the
+ * context is attached, and that's the value we want to report
outside
+ * of cxl.
+ */
+int pe;
+int external_pe;
+
  u32 irq_count;
  bool pe_inserted;
  bool master;
@@ -782,6 +800,7 @@ void cxl_pci_vphb_reconfigure(struct cxl_afu *afu);
  void cxl_pci_vphb_remove(struct cxl_afu *afu);

  extern struct pci_driver cxl_pci_driver;
+extern struct platform_driver cxl_of_driver;
  int afu_allocate_irqs(struct cxl_context *ctx, u32 count);

  int afu_open(struct inode *inode, struct

Re: Fwd: [PATCH v4 11/18] cxl: Separate bare-metal fields in adapter and AFU data structures

2016-02-22 Thread Frederic Barrat

Manoj,

Point taken. Those constants are all defined in the architecture 
document (CAIA). We should probably use more macros there.
However, since those were not introduced by this patch, I'll put it in 
my todo list for the future, but don't intend to address it in this 
patchset.


  Fred

Le 22/02/2016 02:14, Manoj Kumar a écrit :

Christophe, Fred: Perhaps none of these comments below are specific
to your patch, but clarification would help the next reviewer.

--
Manoj Kumar


Subject: [PATCH v4 11/18] cxl: Separate bare-metal fields in adapter and




-WARN_ON(afu->spa_size > 0x10); /* Max size supported by the
hardware */
+WARN_ON(afu->native->spa_size > 0x10); /* Max size supported by
the hardware */


Would prefer to see a MACRO defined, instead of the literal 0x100




  cxl_p1_write(adapter, CXL_PSL_ErrIVTE, 0x);


Same as above.



  p1n_base = p1_base(dev) + 0x1 + (afu->slice * p1n_size);


Same as above.



@@ -621,7 +622,7 @@ static int cxl_read_afu_descriptor(struct cxl_afu
*afu)
  afu->pp_size = AFUD_PPPSA_LEN(val) * 4096;


Both val and pp_size are 64bit quantities. Not clear how the overflow
during multiplication is going to be handled.



  afu->crs_len = AFUD_CR_LEN(val) * 256;


What do the 4096 and 256 represent?



  /* Convert everything to bytes, because there is NO WAY I'd look
at the
   * code a month later and forget what units these are in ;-) */
-adapter->ps_off = ps_off * 64 * 1024;
+adapter->native->ps_off = ps_off * 64 * 1024;
  adapter->ps_size = ps_size * 64 * 1024;
-adapter->afu_desc_off = afu_desc_off * 64 * 1024;
-adapter->afu_desc_size = afu_desc_size *64 * 1024;
+adapter->native->afu_desc_off = afu_desc_off * 64 * 1024;
+adapter->native->afu_desc_size = afu_desc_size * 64 * 1024;


Is this (64k) page size related?




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fwd: [PATCH v4 08/18] cxl: IRQ allocation for guests

2016-02-22 Thread Frederic Barrat

Le 21/02/2016 23:30, Manoj Kumar a écrit :

Subject: [PATCH v4 08/18] cxl: IRQ allocation for guests
Date: Tue, 16 Feb 2016 22:39:01 +0100
From: Frederic Barrat 
To: imun...@au1.ibm.com, michael.neul...@au1.ibm.com,
m...@ellerman.id.au, linuxppc-dev@lists.ozlabs.org

The PSL interrupt is not going to be multiplexed in a guest, so an
interrupt will be allocated for it for each context.


Not clear why this is the case. Why cannot the CXL later still
multiplex this in a guest? Is this a design choice, an
architectural issue, or the complexity of implementation did
not warrant this? From an API perspective it would have been
preferable to not cascade this change down to all consumers,
and have consumers aware whether they are working in a
bare-metal or a guest environment.



It was a design choice made by pHyp. We cannot multiplex the PSL 
interrupt with the current pHyp implementation.


But it doesn't affect the API: the behavior of the API specifying the 
number of interrupts for a context is consistent: the driver always 
expects the number of AFU interrupts on bare-metal and in a LPAR. The 
PSL interrupt is never included.


You can see a difference in the maximum number of attachable contexts 
between bare-metal and powerVM (if the limiting factor is the number of 
available interrupts). But there's no guarantee for that at the API level.


  Fred

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/pagetable: Add option to dump kernel pagetable

2016-02-22 Thread Anshuman Khandual
On 02/22/2016 11:32 AM, Rashmica Gupta wrote:
> Useful to be able to dump the kernel page tables to check permissions and
> memory types - derived from arm64's implementation.
> 
> Add a debugfs file to check the page tables. To use this the PPC_PTDUMP
> config option must be selected.
> 
> Tested on 64BE and 64LE with both 4K and 64K page sizes.
> ---

This statement above must be after the  line else it will be part of
the commit message or you wanted the test note as part of commit message
itself ?

The patch seems to contain some white space problems. Please clean them up.

>  arch/powerpc/Kconfig.debug |  12 ++
>  arch/powerpc/mm/Makefile   |   1 +
>  arch/powerpc/mm/dump.c | 364 
> +
>  3 files changed, 377 insertions(+)
>  create mode 100644 arch/powerpc/mm/dump.c
> 
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 638f9ce740f5..e4883880abe3 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -344,4 +344,16 @@ config FAIL_IOMMU
>  
> If you are unsure, say N.
>  
> +config PPC_PTDUMP
> +bool "Export kernel pagetable layout to userspace via debugfs"
> +depends on DEBUG_KERNEL
> +select DEBUG_FS
> +help
> +  This options dumps the state of the kernel pagetables in a debugfs
> +  file. This is only useful for kernel developers who are working in
> +  architecture specific areas of the kernel - probably not a good 
> idea to
> +  enable this feature in a production kernel.

Just clean the paragraph alignment here 
..

> +
> +  If you are unsure, say N.
> +
>  endmenu
> diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
> index 1ffeda85c086..16f84bdd7597 100644
> --- a/arch/powerpc/mm/Makefile
> +++ b/arch/powerpc/mm/Makefile
> @@ -40,3 +40,4 @@ obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
>  obj-$(CONFIG_HIGHMEM)+= highmem.o
>  obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o
>  obj-$(CONFIG_SPAPR_TCE_IOMMU)+= mmu_context_iommu.o
> +obj-$(CONFIG_PPC_PTDUMP) += dump.o

File name like "[kernel_]pgtable_dump.c" will sound better ? Or
just use like the X86 one "dump_pagetables.c". "dump.c" sounds
very generic.

> diff --git a/arch/powerpc/mm/dump.c b/arch/powerpc/mm/dump.c
> new file mode 100644
> index ..937b10fc40cc
> --- /dev/null
> +++ b/arch/powerpc/mm/dump.c
> @@ -0,0 +1,364 @@
> +/*
> + * Copyright 2016, Rashmica Gupta, IBM Corp.
> + * 
> + * Debug helper to dump the current kernel pagetables of the system
> + * so that we can see what the various memory ranges are set to.
> + * 
> + * Derived from the arm64 implementation:
> + * Copyright (c) 2014, The Linux Foundation, Laura Abbott.
> + * (C) Copyright 2008 Intel Corporation, Arjan van de Ven.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define PUD_TYPE_MASK   (_AT(u64, 3) << 0)
> +#define PUD_TYPE_SECT   (_AT(u64, 1) << 0)
> +#define PMD_TYPE_MASK   (_AT(u64, 3) << 0)
> +#define PMD_TYPE_SECT   (_AT(u64, 1) << 0)
> +
> + 
> +#if CONFIG_PGTABLE_LEVELS == 2
> +#include 
> +#elif CONFIG_PGTABLE_LEVELS == 3
> +#include 
> +#endif

Really ? Do we have any platform with only 2 level of page table ?
 
> + 
> +#define pmd_sect(pmd)  ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> +#ifdef CONFIG_PPC_64K_PAGES
> +#define pud_sect(pud)   (0)
> +#else
> +#define pud_sect(pud)   ((pud_val(pud) & PUD_TYPE_MASK) == \
> +   PUD_TYPE_SECT)
> +#endif

Can you please explain the use of pmd_sect() and pud_sect() defines ?

> +  
> +
> +struct addr_marker {
> + unsigned long start_address;
> + const char *name;
> +};

All the architectures are using the same structure addr_marker.
Cannot we just move it to a generic header file ? There are
other such common structures like these in the file which are
used across architectures and can be moved to some where common ?

> +
> +enum address_markers_idx {
> + VMALLOC_START_NR = 0,
> + VMALLOC_END_NR,
> + ISA_IO_START_NR,
> + ISA_IO_END_NR,
> + PHB_IO_START_NR,
> + PHB_IO_END_NR,
> + IOREMAP_START_NR,
> + IOREMP_END_NR,
> +};

Where these are used ? ^ I dont see any where.

Also as it dumps only the kernel virtual mapping, should not we
mention about it some where.

> +
> +static struct addr_marker address_markers[] = {
> + { VMALLOC_START,"vmalloc() Area" },
> + { VMALLOC_END,  "vmalloc() End" },
> + { ISA_IO_BASE,  "isa I/O start"

Re: MAINTAINERS: Update EEH details and maintainership

2016-02-22 Thread Michael Ellerman
On Wed, 2016-17-02 at 06:06:04 UTC, Russell Currey wrote:
> Enhanced Error Handling could mean anything in the context of the entire
> kernel, so change the name to reference that it is both for PCI and
> powerpc.
> 
> EEH covers a bit more than the previously listed files, so add the headers
> and platform-specific code to the EEH maintained section.
> 
> In addition, I am taking over the maintainership.
> 
> Signed-off-by: Russell Currey 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/78c1cffdab7e5fc316be7d6a07

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fix kgdb on little endian ppc64le

2016-02-22 Thread Michael Ellerman
On Mon, 2016-01-02 at 06:03:25 UTC, Balbir Singh wrote:
> From: Balbir Singh 
> 
> I spent some time trying to use kgdb and debugged my inability to
> resume from kgdb_handle_breakpoint(). NIP is not incremented
> and that leads to a loop in the debugger.
> 
> I've tested this lightly on a virtual instance with KDB enabled.
> After the patch, I am able to get the "go" command to work as
> expected
> 
> Signed-off-by: Balbir Singh 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/94e3d923592fcfe5585c18a0af

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v6, 1/4] atomics: Allow architectures to define their own __atomic_op_* helpers

2016-02-22 Thread Michael Ellerman
On Tue, 2015-15-12 at 14:24:14 UTC, Boqun Feng wrote:
> Some architectures may have their special barriers for acquire, release
> and fence semantics, so that general memory barriers(smp_mb__*_atomic())
> in the default __atomic_op_*() may be too strong, so allow architectures
> to define their own helpers which can overwrite the default helpers.
> 
> Signed-off-by: Boqun Feng 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e1ab7f39d7e0dbfbdefe148be3

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/mm: Fix HAVE_ARCH_SOFT_DIRTY dependencies

2016-02-22 Thread Ben Hutchings
On Mon, 2016-02-22 at 19:24 +1100, Michael Ellerman wrote:
> On Sat, 2016-20-02 at 17:58:37 UTC, Ben Hutchings wrote:
> > Soft dirty bit support was only implemented for 64-bit Book3S, and
> > 32-bit configurations currently fail to build.
> > 
> > Fixes: 7207f43665b8 ("powerpc/mm: Add page soft dirty tracking")
> > References: 
> > https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=4.5%7Erc4-1%7Eexp1&stamp=1455791718
> > Signed-off-by: Ben Hutchings 
> 
> Thanks for the patch Ben.
> 
> I merged a similar but not identical patch into my fixes branch last week, and
> Linus merged it over the weekend. Let me know if it isn't sufficient for you:
> 
>   https://git.kernel.org/torvalds/c/19f97c983071

Yes that works, thanks.

Ben.

-- 
Ben Hutchings
The generation of random numbers is too important to be left to chance.
- Robert Coveyou

signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] powerpc/86xx: Switch to kconfig fragments approach

2016-02-22 Thread Alessio Igor Bogani
Signed-off-by: Alessio Igor Bogani 
---
 arch/powerpc/Makefile|  10 +
 arch/powerpc/configs/86xx-hw.config  | 106 ++
 arch/powerpc/configs/86xx-smp.config |   2 +
 arch/powerpc/configs/86xx/gef_ppc9a_defconfig| 234 ---
 arch/powerpc/configs/86xx/gef_sbc310_defconfig   | 234 ---
 arch/powerpc/configs/86xx/gef_sbc610_defconfig   | 234 ---
 arch/powerpc/configs/86xx/mpc8610_hpcd_defconfig | 233 --
 arch/powerpc/configs/86xx/mpc8641_hpcn_defconfig | 234 ---
 arch/powerpc/configs/86xx/sbc8641d_defconfig | 234 ---
 arch/powerpc/configs/mpc86xx_basic_defconfig |  10 +
 arch/powerpc/configs/mpc86xx_defconfig   | 162 
 11 files changed, 128 insertions(+), 1565 deletions(-)
 create mode 100644 arch/powerpc/configs/86xx-hw.config
 create mode 100644 arch/powerpc/configs/86xx-smp.config
 delete mode 100644 arch/powerpc/configs/86xx/gef_ppc9a_defconfig
 delete mode 100644 arch/powerpc/configs/86xx/gef_sbc310_defconfig
 delete mode 100644 arch/powerpc/configs/86xx/gef_sbc610_defconfig
 delete mode 100644 arch/powerpc/configs/86xx/mpc8610_hpcd_defconfig
 delete mode 100644 arch/powerpc/configs/86xx/mpc8641_hpcn_defconfig
 delete mode 100644 arch/powerpc/configs/86xx/sbc8641d_defconfig
 create mode 100644 arch/powerpc/configs/mpc86xx_basic_defconfig
 delete mode 100644 arch/powerpc/configs/mpc86xx_defconfig

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 96efd82..1a0ee01 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -310,6 +310,16 @@ corenet64_smp_defconfig:
$(call merge_into_defconfig,corenet_basic_defconfig,\
85xx-64bit 85xx-smp altivec 85xx-hw fsl-emb-nonhw)
 
+PHONY += mpc86xx_defconfig
+mpc86xx_defconfig:
+   $(call merge_into_defconfig,mpc86xx_basic_defconfig,\
+   86xx-hw fsl-emb-nonhw)
+
+PHONY += mpc86xx_smp_defconfig
+mpc86xx_smp_defconfig:
+   $(call merge_into_defconfig,mpc86xx_basic_defconfig,\
+   86xx-smp 86xx-hw fsl-emb-nonhw)
+
 define archhelp
   @echo '* zImage  - Build default images selected by kernel config'
   @echo '  zImage.*- Compressed kernel image 
(arch/$(ARCH)/boot/zImage.*)'
diff --git a/arch/powerpc/configs/86xx-hw.config 
b/arch/powerpc/configs/86xx-hw.config
new file mode 100644
index 000..1011584
--- /dev/null
+++ b/arch/powerpc/configs/86xx-hw.config
@@ -0,0 +1,106 @@
+CONFIG_ATA=y
+CONFIG_BLK_DEV_IDECS=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_BLK_DEV_SR=y
+CONFIG_BROADCOM_PHY=y
+# CONFIG_CARDBUS is not set
+CONFIG_CHR_DEV_SG=y
+CONFIG_CHR_DEV_ST=y
+CONFIG_CRC_T10DIF=y
+CONFIG_CRYPTO_HMAC=y
+CONFIG_DS1682=y
+CONFIG_EEPROM_LEGACY=y
+CONFIG_GEF_WDT=y
+CONFIG_GIANFAR=y
+CONFIG_GPIO_GE_FPGA=y
+CONFIG_GPIO_SYSFS=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_PANTHERLORD=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SUNPLUS=y
+CONFIG_HW_RANDOM=y
+CONFIG_HZ_1000=y
+CONFIG_I2C_MPC=y
+CONFIG_I2C=y
+CONFIG_IDE=y
+# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
+# CONFIG_INET_XFRM_MODE_TUNNEL is not set
+CONFIG_INPUT_FF_MEMLESS=m
+# CONFIG_INPUT_KEYBOARD is not set
+# CONFIG_INPUT_MOUSEDEV is not set
+# CONFIG_INPUT_MOUSE is not set
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_CFI_ADV_OPTIONS=y
+CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_MTD_CFI_INTELEXT=y
+CONFIG_MTD_CFI_LE_BYTE_SWAP=y
+CONFIG_MTD_CFI=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_JEDECPROBE=y
+CONFIG_MTD_NAND_FSL_ELBC=y
+CONFIG_MTD_NAND=y
+CONFIG_MTD_PHYSMAP_OF=y
+CONFIG_NETDEVICES=y
+CONFIG_NET_TULIP=y
+CONFIG_NVRAM=y
+CONFIG_PATA_ALI=y
+CONFIG_PCCARD=y
+CONFIG_PCI_DEBUG=y
+# CONFIG_PCIEASPM is not set
+CONFIG_PCIEPORTBUS=y
+CONFIG_PCI=y
+# CONFIG_PCMCIA_LOAD_CIS is not set
+# CONFIG_PPC_CHRP is not set
+# CONFIG_PPC_PMAC is not set
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_CMOS=y
+CONFIG_RTC_DRV_RX8581=y
+CONFIG_SATA_AHCI=y
+CONFIG_SATA_SIL24=y
+CONFIG_SATA_SIL=y
+CONFIG_SCSI_LOGGING=y
+CONFIG_SENSORS_LM90=y
+CONFIG_SENSORS_LM92=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_DETECT_IRQ=y
+CONFIG_SERIAL_8250_EXTENDED=y
+CONFIG_SERIAL_8250_MANY_PORTS=y
+CONFIG_SERIAL_8250_NR_UARTS=2
+CONFIG_SERIAL_8250_RSA=y
+CONFIG_SERIAL_8250_RUNTIME_UARTS=2
+CONFIG_SERIAL_8250_SHARE_IRQ=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIO_LIBPS2=y
+CONFIG_SND_INTEL8X0=y
+CONFIG_SND_MIXER_OSS=y
+CONFIG_SND_PCM_OSS=y
+# CONFIG_SND_SUPPORT_OLD_API is not set
+CONFIG_SND=y
+CONFIG_SOUND=y
+CONFIG_ULI526X=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_MON=y
+CONFIG_USB_OHCI_HCD_PPC_OF_BE=y
+CONFIG_USB_OHCI_HCD_PPC_OF_LE=y
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_STORAGE=y
+CONFIG_USB=y
+CONFIG_VITESSE_PHY=y
+CONFIG_VME_BUS=y
+CONFIG_VME_TSI148=y
+CONFIG_WATCHDOG=y
+# CONFIG_YENTA_

[PATCH v2 1/2] powerpc/86xx: Update defconfigs

2016-02-22 Thread Alessio Igor Bogani
This patch show how defconfigs appear if the kconfig fragment approach is
used.

Signed-off-by: Alessio Igor Bogani 
---
v1 -> v2
Split changes in two patches as suggested by Scott Wood

 arch/powerpc/configs/86xx/gef_ppc9a_defconfig| 208 +++---
 arch/powerpc/configs/86xx/gef_sbc310_defconfig   | 212 +++---
 arch/powerpc/configs/86xx/gef_sbc610_defconfig   | 277 ---
 arch/powerpc/configs/86xx/mpc8610_hpcd_defconfig | 157 +--
 arch/powerpc/configs/86xx/mpc8641_hpcn_defconfig |  92 ++-
 arch/powerpc/configs/86xx/sbc8641d_defconfig | 336 +++
 6 files changed, 735 insertions(+), 547 deletions(-)

diff --git a/arch/powerpc/configs/86xx/gef_ppc9a_defconfig 
b/arch/powerpc/configs/86xx/gef_ppc9a_defconfig
index 9792a2c..4ffbc4f 100644
--- a/arch/powerpc/configs/86xx/gef_ppc9a_defconfig
+++ b/arch/powerpc/configs/86xx/gef_ppc9a_defconfig
@@ -2,30 +2,49 @@ CONFIG_SMP=y
 CONFIG_NR_CPUS=2
 CONFIG_SYSVIPC=y
 CONFIG_POSIX_MQUEUE=y
+CONFIG_FHANDLE=y
+CONFIG_AUDIT=y
+CONFIG_IRQ_DOMAIN_DEBUG=y
+CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_BSD_PROCESS_ACCT=y
-CONFIG_BSD_PROCESS_ACCT_V3=y
 CONFIG_IKCONFIG=y
 CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=14
-CONFIG_RELAY=y
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_CPUSETS=y
+CONFIG_CGROUP_CPUACCT=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
-CONFIG_SLAB=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_PERF_EVENTS=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
+CONFIG_MODULE_FORCE_UNLOAD=y
+CONFIG_MODVERSIONS=y
 # CONFIG_BLK_DEV_BSG is not set
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_MAC_PARTITION=y
 # CONFIG_PPC_CHRP is not set
 # CONFIG_PPC_PMAC is not set
 CONFIG_PPC_86xx=y
+CONFIG_MPC8641_HPCN=y
+CONFIG_SBC8641D=y
+CONFIG_MPC8610_HPCD=y
 CONFIG_GEF_PPC9A=y
+CONFIG_GEF_SBC310=y
+CONFIG_GEF_SBC610=y
 CONFIG_HIGHMEM=y
 CONFIG_HZ_1000=y
-CONFIG_PREEMPT=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_BINFMT_MISC=m
+CONFIG_KEXEC=y
+CONFIG_FORCE_MAX_ZONEORDER=13
 CONFIG_PCI=y
 CONFIG_PCIEPORTBUS=y
 # CONFIG_PCIEASPM is not set
+CONFIG_PCI_DEBUG=y
 CONFIG_PCCARD=y
 # CONFIG_PCMCIA_LOAD_CIS is not set
 # CONFIG_CARDBUS is not set
@@ -36,8 +55,11 @@ CONFIG_YENTA=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
-CONFIG_XFRM_USER=m
-CONFIG_NET_KEY=m
+CONFIG_XFRM_USER=y
+CONFIG_XFRM_SUB_POLICY=y
+CONFIG_XFRM_STATISTICS=y
+CONFIG_NET_KEY=y
+CONFIG_NET_KEY_MIGRATE=y
 CONFIG_INET=y
 CONFIG_IP_MULTICAST=y
 CONFIG_IP_ADVANCED_ROUTER=y
@@ -48,72 +70,79 @@ CONFIG_IP_PNP=y
 CONFIG_IP_PNP_DHCP=y
 CONFIG_IP_PNP_BOOTP=y
 CONFIG_IP_PNP_RARP=y
-CONFIG_NET_IPIP=m
+CONFIG_NET_IPIP=y
 CONFIG_IP_MROUTE=y
 CONFIG_IP_PIMSM_V1=y
 CONFIG_IP_PIMSM_V2=y
-CONFIG_SYN_COOKIES=y
-CONFIG_INET_AH=m
-CONFIG_INET_ESP=m
-CONFIG_INET_IPCOMP=m
+CONFIG_INET_AH=y
+CONFIG_INET_ESP=y
+CONFIG_INET_IPCOMP=y
+# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
+# CONFIG_INET_XFRM_MODE_TUNNEL is not set
 # CONFIG_INET_XFRM_MODE_BEET is not set
-CONFIG_INET6_AH=m
-CONFIG_INET6_ESP=m
-CONFIG_INET6_IPCOMP=m
-CONFIG_IPV6_TUNNEL=m
-CONFIG_NET_PKTGEN=m
+# CONFIG_INET_LRO is not set
+CONFIG_IP_SCTP=m
 CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
 CONFIG_MTD=y
+CONFIG_MTD_CMDLINE_PARTS=y
 CONFIG_MTD_BLOCK=y
+CONFIG_FTL=y
 CONFIG_MTD_CFI=y
 CONFIG_MTD_JEDECPROBE=y
+CONFIG_MTD_CFI_ADV_OPTIONS=y
+CONFIG_MTD_CFI_LE_BYTE_SWAP=y
 CONFIG_MTD_CFI_INTELEXT=y
 CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_MTD_PHYSMAP_OF=y
-CONFIG_BLK_DEV_LOOP=m
-CONFIG_BLK_DEV_CRYPTOLOOP=m
-CONFIG_BLK_DEV_NBD=m
+CONFIG_MTD_NAND=y
+CONFIG_MTD_NAND_FSL_ELBC=y
+CONFIG_MTD_UBI=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_NBD=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_SIZE=131072
 CONFIG_DS1682=y
+CONFIG_EEPROM_LEGACY=y
 CONFIG_IDE=y
 CONFIG_BLK_DEV_IDECS=y
 CONFIG_BLK_DEV_SD=y
 CONFIG_CHR_DEV_ST=y
 CONFIG_BLK_DEV_SR=y
+CONFIG_CHR_DEV_SG=y
+CONFIG_SCSI_LOGGING=y
 CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_SATA_SIL24=y
 CONFIG_SATA_SIL=y
+CONFIG_PATA_ALI=y
 CONFIG_NETDEVICES=y
-CONFIG_BONDING=m
-CONFIG_DUMMY=m
-CONFIG_NETCONSOLE=y
-CONFIG_TUN=m
+CONFIG_DUMMY=y
+CONFIG_NET_TULIP=y
+CONFIG_ULI526X=y
 CONFIG_GIANFAR=y
-CONFIG_PPP=m
-CONFIG_PPP_BSDCOMP=m
-CONFIG_PPP_DEFLATE=m
-CONFIG_PPP_FILTER=y
-CONFIG_PPP_MULTILINK=y
-CONFIG_PPPOE=m
-CONFIG_PPP_ASYNC=m
-CONFIG_PPP_SYNC_TTY=m
-CONFIG_SLIP=m
-CONFIG_SLIP_COMPRESSED=y
-CONFIG_SLIP_SMART=y
-CONFIG_SLIP_MODE_SLIP6=y
+CONFIG_VITESSE_PHY=y
+CONFIG_BROADCOM_PHY=y
+CONFIG_FIXED_PHY=y
+CONFIG_INPUT_FF_MEMLESS=m
+# CONFIG_INPUT_MOUSEDEV is not set
 # CONFIG_INPUT_KEYBOARD is not set
 # CONFIG_INPUT_MOUSE is not set
-# CONFIG_SERIO is not set
+CONFIG_SERIO_LIBPS2=y
 # CONFIG_LEGACY_PTYS is not set
 CONFIG_SERIAL_8250=y
 CONFIG_SERIAL_8250_CONSOLE=y
-# CONFIG_SERIAL_8250_PCI is not set
 CONFIG_SERIAL_8250_NR_UARTS=2
 CONFIG_SERIAL_8250_RUNTIME_UARTS=2
+CONFIG_SERIAL_8250_EXTENDED=y
+CONFIG_SERIAL_8250_MANY_PORTS=y
+CONFIG_SERIAL_8250_SHARE_IRQ=y
+CONFIG_SERIAL_8250_DETECT_IRQ=y
+CONFIG_SERIAL_8250_RSA=y
 CONFIG_HW_RANDOM=y
 CONFIG_NVRAM=y

Re: powerpc/mm: Fix HAVE_ARCH_SOFT_DIRTY dependencies

2016-02-22 Thread Michael Ellerman
On Sat, 2016-20-02 at 17:58:37 UTC, Ben Hutchings wrote:
> Soft dirty bit support was only implemented for 64-bit Book3S, and
> 32-bit configurations currently fail to build.
> 
> Fixes: 7207f43665b8 ("powerpc/mm: Add page soft dirty tracking")
> References: 
> https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=4.5%7Erc4-1%7Eexp1&stamp=1455791718
> Signed-off-by: Ben Hutchings 

Thanks for the patch Ben.

I merged a similar but not identical patch into my fixes branch last week, and
Linus merged it over the weekend. Let me know if it isn't sufficient for you:

  https://git.kernel.org/torvalds/c/19f97c983071

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev